Friday, October 25, 2019

computer vision - Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters


Assume a camera has no skew, and has square pixels, that is, its camera-calibration matrix, $K$, looks like: $$ K = \left[\begin{matrix} \alpha&0&u_x\\0&\alpha&u_y\\0&0&1 \end{matrix}\right] $$


Intuitively, it seems that it should be possible to recover the pose of a rectangular planar object, $P$, (for example a flat piece of paper) by only knowing the projected image coordinates $\vec{p_i}$ for $i={1, 2, 3, 4}$ of the object's four corners without knowledge of $K$. To be more precise, I only care about the orientation of the unit normal vector, $\vec{n}$, of $P$, not its location.


Following the answer given in response to this question: Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers,


We choose world-coordinates such that $P$ is the plane $Z=0$. We find the homography, $H$, mapping the four-corners of $P$ in world-frame coordinates to its projected coordinates $\vec{p_i}$ for $i=1,2,3,4$.


Then, $H = K[R | \vec{t}]$. Where $R \in SO(3)$ is the camera's rotation and $t \in \mathbb{R}^3$ is the camera's translation. If we make the assumption that $K$ is identity (I'll return to this later), then $H = [\vec{R_1}|\vec{R_2}|\vec{t}]$. Where $\vec{R_1}, \vec{R_2}$ are the first and second columns of $R$. The third column of $R$, $\vec{R_3}$, is equal to $\vec{R_1}\times\vec{R_2}$. Finally, $\vec{n} = R\left[\begin{matrix}0\\0\\1\end{matrix}\right] = \vec{R_1}\times\vec{R_2}$ and we are done.


Now let's examine the effect of $K$ on our answer $\vec{n}$. My intuition tells me that $K$ shouldn't matter because varying $\alpha$ will ultimately act as a scalar multiple on our answer $\vec{n}$ which is normalized out since I am only concerned with the direction of $\vec{n}$, not its magnitude. In addition, it seems that varying $u_x, u_y$ should have the effect of translating the locations of the projected corners $\vec{p_i}$, which should affect $\vec{t}$, but not $\vec{n}$.


Let's see if this is true. If $K \neq I$, then $H = K[\vec{R_1}|\vec{R_2}|\vec{t}]$ and $$\begin{equation} \vec{n} = K \vec{R_1} \times K \vec{R_2} = det(K)K^{-T}(\vec{R_1}\times\vec{R_2})\end{equation}$$


In particular, if $\alpha \neq 1$, $$\vec{n} = \alpha(\vec{R_1} \times \vec{R_2})$$ which is our answer from above times a scalar as expected.


On the other hand, if $u_x, u_y \neq 0$ then $$det(K)K^{-T} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right]$$



and $$\vec{n} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right](\vec{R_1} \times \vec{R_2})$$


In other words, our choice of camera center has a large effect on our calculations. This is not intuitive to me. Isn't our choice of camera center, $u_x, u_y$ completely arbitrary (for example $[0, 0]$ vs. $[w/2, h/2]$) or is there something I am missing?




No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...