I have an algorithm than can recognize the 2d pixel locations of certain 3d points within a 2d image. Ultimately we are interested in computing a real-world horizontal distance between these points. See the below example.
We have some known a-priori information about the world and the camera.
First of all, we know the real vertical distances between the two pairs of points, and this is always a fixed value in the real world (e.g. 2 meters). We hope to use this information as a reference value to somehow proportionalize to other points in the world.
Other assumptions that we have (and hence may be used in the algorithm) are
- known corresponding world distances between points (vertically)
- detected points line in the same plane
- camera’s optical axis points approximately with a right angle w.r.t. this plane
- focal length f
- field of view (angular)
- more than 1 image taken at potentially different heights
Our current method of solving this, (i.e. obtaining the real-world horizontal distance) is by linearly extrapolating the known reference distances (vertical) to fit the pixel locations.
Now this should (and theoretically will) work perfectly given the above assumptions. However, if the pixel location is off by 1 or 2 pixels, then this can propagate back to an error of ~20 mm depending on the height. Not to mention other variations in the real world like cam angle, object height offsets etc., which may yield too big of an error. Also, this computation involves not all the information available: that is we only need 1 of the two 2 known distances and only 1 image to perform this computation.
Now I’m wondering if we can’t approach this problem in a better way, by combining:
– multiple images,
– the information of the focal length/angle
– and potentially multiple reference distances within the image.
Research suggest that there are algorithms like Kanade-Tomasi and back projection, but I’m not sure if they are relevant to us, because they use different assumptions (like constant distance to objects?).
Anyway, I myself was thinking in terms of a least squares approach, but not sure how to parametrize the model such that a 3d reconstruction (or real-world distances) is predicted by it, given the knowns (focal length etc.).
So I’m seeking for a push in the right direction, that can help solve our prediction/computation problem, given the above assumptions.