Tuesday, July 27, 2010


I stayed up late last night to finish writing a bit of multitouch code--actually the first time I've ever needed to do so (write multitouch support, that is, not stay up late). I was surprised at the amount of math involved to get the behavior that I wanted.

Still working on my new project Praetor, which among other things presents a fairly large world map at one point. I wanted that map to be able to pan and zoom with the same smooth behavior as Google Maps. If you've ever tried the Maps applet on the iPhone you know what I mean: it's surprisingly intuitive to drag the map around, pinch to zoom out and reverse-pinch to zoom in. And although I've observed the mechanics of the display matching itself to my motions frequently, I'd stopped at the reverse-engineer stage and hadn't really thought about implementing my own subset.

I started with a world map--just a big image (say, 2000 pixels on each side), far larger than the tiny screen of this phone (480x800) can show at once. To allow zooming in and out I had to allow for a scaling factor: the on-screen size of the image would be (worldmapsize / scale), so if scale==1 then I'd show a small portion of the map life-size and if scale==2 then I'd show a lot more of the map but shrunken down by 50%. And since I can't fit the whole map on the screen at once (except when scale is very large, zooming way away from the image), I have to account for panning too: what virtual pixel sits right underneath the center of the physical screen?

Anyway, I'll skip a lot of the trial and error in order to summarize the approach that finally worked out. When my app first notices there are two fingers on the display, it remembers how far apart they are and it averages their locations to identify the spot right in the middle--noticing primarily not the actual coordinates of the pixel on the screen, but more importantly which point in the virtual world map is currently being drawn there.

Thereafter when the fingers move--one or both, doesn't matter--I calculate a new distance-between value and a new center-point (this time worrying about both screen-space and world-map space). The change in distance-between-fingers represents a change in scale, and it turns out that the new scale should be equal to (old scale) * (old distance) / (new distance). From that simple formula it's clear that if you spread your fingers apart, the scale is going to decrease--moving from a high value towards 1, meaning that the pixels on screen are getting bigger. And the reverse for zooming out.

It's important to be precise about just how much the scale changes when you move your fingers, because ideally the behavior of the map is likewise very precise. Put two fingers on a map of the US--one finger in NYC and another in LA, and pinch. When you look again, your fingers should still be on those cities--which means you need to have calculated exactly the right scale and moved the map exactly with the fingers.

Right, so there was scale: just look at the distance between the fingers. To make the pan aspect work too, I concentrated on the center point, remembering that initial virtual pixel that was visible in the center when the fingers first touched down, and panning the display until that same virtual pixel is once again directly between the two fingers wherever they go. And poof: instant Google Maps behavior.

The last polishing touch was to allow the map to actually leave the legal bounds of the screen. This is something that Apple introduced in its iPhone and is really pleasing to the eye. In Windows, a scrollbar is a hard stop: if you try to page-up when you're at the top of the page, nothing happens. On an iPhone, if you're at the top of the page and try to drag downwards with your finger, the page will actually follow you (technically it does so at half speed from your finger, but that's a minor point). When you release your finger, the page will snap back to its legal setting--right at the top of the screen.

There are two reasons this works well from a UI standpoint. The first is that there's an immediate, direct visual response to any input you give: there's no concept of "that motion is illegal so I'll ignore you," which is otherwise very frustrating. The second is that the user gets used to being able to peek around the sides of an object to see if there's anything else--and that behavior makes the device's virtual space feel larger than it really is, which is freeing and pleasant. All good stuff.

Fortunately, it's also really easy to implement--you just have to relax your constraints a little bit, and recognize that it's just not that awful to draw this rectangular image such that it's halfway off the screen. Once the fingers stop dragging, on every frame you move the image 1/Nth of the way back to where it was (newPos = oldPos + (correctPos - oldPos) / N), and over the next few animation frames the display will "snap" back to where it belongs.

No comments:

Post a Comment