computer vision - Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters

Friday, October 25, 2019

computer vision - Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters

Assume a camera has no skew, and has square pixels, that is, its camera-calibration matrix, $K$ , looks like:

$K = \left[\begin{matrix} \alpha&0&u_x\\0&\alpha&u_y\\0&0&1 \end{matrix}\right]$

Intuitively, it seems that it should be possible to recover the pose of a rectangular planar object, $P$ , (for example a flat piece of paper) by only knowing the projected image coordinates $\vec{p_i}$ for $i={1, 2, 3, 4}$ of the object's four corners without knowledge of $K$ . To be more precise, I only care about the orientation of the unit normal vector, $\vec{n}$ , of $P$ , not its location.

Following the answer given in response to this question: Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers,

We choose world-coordinates such that $P$ is the plane $Z=0$ . We find the homography, $H$ , mapping the four-corners of $P$ in world-frame coordinates to its projected coordinates $\vec{p_i}$ for $i=1,2,3,4$ .

Then, $H = K[R | \vec{t}]$ . Where $R \in SO(3)$ is the camera's rotation and $t \in \mathbb{R}^3$ is the camera's translation. If we make the assumption that $K$ is identity (I'll return to this later), then $H = [\vec{R_1}|\vec{R_2}|\vec{t}]$ . Where $\vec{R_1}, \vec{R_2}$ are the first and second columns of $R$ . The third column of $R$ , $\vec{R_3}$ , is equal to $\vec{R_1}\times\vec{R_2}$ . Finally, $\vec{n} = R\left[\begin{matrix}0\\0\\1\end{matrix}\right] = \vec{R_1}\times\vec{R_2}$ and we are done.

Now let's examine the effect of $K$ on our answer $\vec{n}$ . My intuition tells me that $K$ shouldn't matter because varying $\alpha$ will ultimately act as a scalar multiple on our answer $\vec{n}$ which is normalized out since I am only concerned with the direction of $\vec{n}$ , not its magnitude. In addition, it seems that varying $u_x, u_y$ should have the effect of translating the locations of the projected corners $\vec{p_i}$ , which should affect $\vec{t}$ , but not $\vec{n}$ .

Let's see if this is true. If $K \neq I$ , then $H = K[\vec{R_1}|\vec{R_2}|\vec{t}]$ and

$\begin{equation} \vec{n} = K \vec{R_1} \times K \vec{R_2} = det(K)K^{-T}(\vec{R_1}\times\vec{R_2})\end{equation}$

In particular, if $\alpha \neq 1$ ,

$\vec{n} = \alpha(\vec{R_1} \times \vec{R_2})$ which is our answer from above times a scalar as expected.

On the other hand, if $u_x, u_y \neq 0$ then

$det(K)K^{-T} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right]$

and

$\vec{n} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right](\vec{R_1} \times \vec{R_2})$

In other words, our choice of camera center has a large effect on our calculations. This is not intuitive to me. Isn't our choice of camera center, $u_x, u_y$ completely arbitrary (for example $[0, 0]$ vs. $[w/2, h/2]$ ) or is there something I am missing?

Notes

Friday, October 25, 2019

computer vision - Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters

No comments:

Post a Comment

digital communications - Understanding the Matched Filter