Friday, October 25, 2019

computer vision - Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters


Assume a camera has no skew, and has square pixels, that is, its camera-calibration matrix, K, looks like: K=[α0ux0αuy001]


Intuitively, it seems that it should be possible to recover the pose of a rectangular planar object, P, (for example a flat piece of paper) by only knowing the projected image coordinates pi for i=1,2,3,4 of the object's four corners without knowledge of K. To be more precise, I only care about the orientation of the unit normal vector, n, of P, not its location.


Following the answer given in response to this question: Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers,


We choose world-coordinates such that P is the plane Z=0. We find the homography, H, mapping the four-corners of P in world-frame coordinates to its projected coordinates pi for i=1,2,3,4.


Then, H=K[R|t]. Where RSO(3) is the camera's rotation and tR3 is the camera's translation. If we make the assumption that K is identity (I'll return to this later), then H=[R1|R2|t]. Where R1,R2 are the first and second columns of R. The third column of R, R3, is equal to R1×R2. Finally, n=R[001]=R1×R2 and we are done.


Now let's examine the effect of K on our answer n. My intuition tells me that K shouldn't matter because varying α will ultimately act as a scalar multiple on our answer n which is normalized out since I am only concerned with the direction of n, not its magnitude. In addition, it seems that varying ux,uy should have the effect of translating the locations of the projected corners pi, which should affect t, but not n.


Let's see if this is true. If KI, then H=K[R1|R2|t] and n=KR1×KR2=det(K)KT(R1×R2)


In particular, if α1, n=α(R1×R2)

which is our answer from above times a scalar as expected.


On the other hand, if ux,uy0 then det(K)KT=[100010uxuy1]



and n=[100010uxuy1](R1×R2)


In other words, our choice of camera center has a large effect on our calculations. This is not intuitive to me. Isn't our choice of camera center, ux,uy completely arbitrary (for example [0,0] vs. [w/2,h/2]) or is there something I am missing?




No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...