2012-11-17

(Re)curse you, projector!

Hi all! We have made some last minute changes to the shared workspace calibration that we thought might be worth sharing. This post got much longer than I had planned, so if you’re the tl;dr type of guy, skip to the last paragraph for a one sentence summary :)
Anyway... To match what I’m drawing on my workspace with the videos of what the other users are drawing, we have been using a simple geometric transform that maps points in camera space to points in browser window space. What the others are drawing is transformed, merged and projected down onto my image. A general equation to convert points from one four sided polygon to another is

x2 = a1x1y1+a2x1+a3y1+a4
y2 = b1x1y1+b2x1+b3y1+b4.


To solve the equations we need the four corner points of the shared workspace area in both coordinate systems. We get those using the following steps:


  1. Draw an AR marker in the area we wish to share
  2. Project the workspace onto a table
  3. Film the projection
  4. Detect the marker in the video stream and extract its corners
  5. Solve a system of equations to get values for a and b in the equations above

This mapping is sent to all participants of the shared workspace, who then use it to transform the shared area in my video stream to match their shared area in the browser.

A problem with this type of transform is that the resulting image always gets a little distorted, more and more the farther away from the anchor points (the corners) you get. And if the camera is mounted at a bad angle, this is very noticeable. However, near the anchor points the results are always pretty good.
The solution seems simple: more anchor points equals a more accurate transform! But the more anchor points used, the more complex the equation for each pixel coordinate will be. And since we aren’t using any hardware accelerated graphics library (such as WebGL), we can’t really afford to spend large amounts of time processing each pixel (of each video stream).
(As a side note, using a WebGL shader to do these calculations would probably be thousands of times faster than what we’re doing right now, so there’s a lot of room for improving performance in the future.)

Our solution is to split the shared area into smaller regions and create separate, simple transforms for each of them. Each transform only manipulates a small part of the video stream, and the results are combined into one image.
The new method splits the area into four subregions, then recursively does the same to each subregion, and so on. The recursion depth is variable and results in 4depth transforms. Setting the depth to zero (= one region) will result in the same behaviour as before.
After some testing, we have decided that a depth of two (16 subregions) gives a good enough result while only prolonging the calibration process with a second or two. Also on the positive side, this improvement in accuracy does not affect performance that much, we just need to copy a bit more image data.

For those of you who didn’t read the whole post (I’m looking at you, everyone!), here’s a little summary: We have improved the calibration/video transformation to make it more accurate and much more tolerant to bad camera angles, nice!

No comments:

Post a Comment