System B1 utilizes simple computer vision techniques to allow a user to control their system using hand and finger motion. 2-button mouse emulation is provided, and includes a simple gesture-recognition system to allow users instant access to pre-defined functionality.
The system was built and tested to be used in a day to day office environment, or to be used as a mouse replacement for laptops in restricted horizontal space.
Other potential uses:
Mouse control in head mounted displays
Gesture-based control for home entertainment systems
For the impatient, videos and pictures are provided in the resources section, (these may be pulled in the advent of an excess level of traffic).
Please note that this is a first-draft quality writeup, limited by the amount of time available to spend on documentation. Error reports and suggestions would be greatly appreciated.
Concept 1. Build a glove with superbright (3000mCd+) LEDs on the end of each finger, figure out the position of each finger and translate movements into computer commands (moving the mouse, clicking, rotation etc).
This idea was tested by stickytaping keychain LEDs to our fingers and seeing how the camera viewed the data - even after turning exposure down to 1/2000th of second, the LEDs still flooded the display, and produced vertical streaks. In addition, it seems that many superbright LEDs have a narrower field of projection, meaning that the LEDs would have to be pointed at the webcam.
In addition, while a solution such as this may be suitable for a gesture based system (it would excel at recognising hand rotation, for example), no software exists to take advantage of this, and for demonstration and usability purposes, it would be better to build a glove that emulates a mouse (clicking and dragging).
Concept 2. Mount an LED on the end of the index finger of a glove, and mount two switches on the user's middle finger, to be activated by the user's thumb - these switches would light two LEDs that would be picked up by the software to produce mouse clicks.
After obtaining a suitable glove, it was noticed that typing with full-length gloves is like trying to eat chocolate with oven mitts, and even with the thinnest of gloves, having an LED on the end of your finger also introduces considerable errors (such as when it gets stuck in your laptop keyboard, and extrication results in keys flying everywhere).
Concept 3. Mount the green 'pointing' LED on a ring to be worn around the middle of the user's index finger, allowing them to type and move without restriction, and mount two switches and accompanying LEDs on another ring to be worn around the middle of the user's middle finger.
This would allow a user to place their thumb on the middle finger switches, and point their index finger slightly upwards to allow the green LED to be seen by the webcam. The middle finger LEDs would be visible when the middle finger was pointed downwards (see figure 3 below). The ring orientation would allow the LED to be swivelled around to the top of the user's index finger, to be used in 'hands down' situations (so the user can leave their hands on the keyboard, with the webcam pointing down from the top of the screen).
Figure 1. Circuit Diagram.
As depicted in Figure 1, the glove used an extremely simple circuit, with the LEDs mounted on two velcro straps acting as 'rings'. The index finger was home to the non-switch glove, while the middle finger hosted the bigger velcro strap, allowing the user's thumb to access the two switches (Figure 2, below).
Figure 2. Component locations.
The actual construction of the glove involved lots of glue, electrical tape, and melting plastic velcro bits. Powered by two AAA batteries, it was estimated that the system would run for around 80 hours before requiring a change of batteries. View a closeup of the middle ring.
The webcam was a USB Logitech Quickcam for Notebooks Pro, which had special mounting legs to allow it to hang from the top of a laptop screen, at any angle from horizontal to pointing down at the keyboard. The laptop used sported a P3-M 1Ghz processor, and had been able to perform real video processing tasks with ease in the past.
Written in C++, using a directshow framegrabber kindly provided by Andreas Müller, the software could grab 320x240 images at the webcam's maximum transfer rate of 30fps. The webcam exposure time was set to 1/2000th of a second, to filter out any non-LED imagery (it works best indoors, and still has issues with bright lights such as other monitors). Points of brightness were isolated (based on distance from previously known points), colour and relative position data was then used to determine what LEDs were active (and therefore which mouse buttons to emulate).
Since functionality for turning the pointing LED on and off wasn't available, mouse movement was set to 'absolute' - positioning the pointer in the top-left cornert of the webcam's vision would place the mouse cursor in the top-left corner of the screen. A buffer zone (20% of the image) was added to the input image to allow users to easily reach the edges of the screen.
Figure 3. Camera's view of the user with buffer zone darkened.
Sub-pixel prescision was obtained by weighting the position based on how bright certain pixels are, a similar technique was used to determine the average colour of each point. This allows the user to position their cursor anywhere on a 1600x1200 screen, despite only using 256x192 (320x240 minus the 64x48 buffer zone) for the input (lower screen + higher video resolutions would perform more accurately, however).
We used strokeit to perform the gesture recognition.
Unfortunately, the first thing that was apparent was that while we'd done most of our testing with the green LED, the two LEDs used for the mouse buttons suffered from a few unforseen problems
Despite having the same mCd rating, the red LED was much darker than the green LED.
The yellow LED showed up quite green on the webcam image (and the green LED showed up quite yellow).
The red darkness problem was addressed by lowering the brightness threshold for detecting LEDs, although now the system is too sensitive to use in well-lit areas.
The yellow/green LED issue was addressed by using the topmost point for movement, and the lower one (if visible) for right-mouse button emulation. This works well in all but 'hands down' situations, where one would probably use the left-most point for movement.
The test machine had been previously configured to have an extremely small interface - all desktop icons had been shrunk to half their default size, all system fonts had been reduced to 7pt, and title bars /scroll bars and minimize+close buttons had all been shrunk to 60% of their original size, all on a 1400x1050 res screen. This made all system areas relatively hard to hit, even with a normal mouse.
Using the shrunken scrollbars was almost impossible on account that they require the user to move the cursor in a perfectly straight line without straying from the scrollbar area. Using Windows' hierachical menus with the shrunken system fonts was also extremely hard, and required an extremely steady hand. Most other common tasks were accomplished relatively quickly, however.
To simulate a more common scenario, we tested with a system resolution at both 800x600 and 1024x768 and using the default WindowsXP skin, which is slightly larger than the 'classic' skin. In this situation, all tasks were performed much more quickly, although scrollbars presented the occasional problem.
When attached to objects with motion that could be mechanically controlled (eg toy cars) the motion across the screen was smooth and consistent, suggesting that the accuracy errors were due to shakey finger motion.
Thanks to the absolute-positioning model using the system at a range of more than 3 metres required far too much movement by the user in order to go from one side of the screen to the other, and at this range the LEDs were extremely hard to pick out and separate. Although at this range, the small jitters in finger motion weren't as easy to pick up, and so the overall accuracy was greatly improved.
There was a perceptable lag in the mouse motion (you can see the correlation between real movement and mouse movement in the video in the resources section), although we have used wireless mice with laggier movement. Thanks to the 30hz update rate, the overall motion was smooth (and again, we've used wireless mice that have been jerkier).
Using the mouse buttons worked well; we were able to perform simple gestures using strokeit (as seen in the video). However, because the mouse-button controlling LEDs are located approximately 2" diagonally down-right of the pointer LED, they are out of the camera's field of view when the pointer LED is at the bottom or right edges of the screen. This makes it extremely hard to click on the items in the windows task bar (it's possible, but you have to cross your fingers). Also, due to the weakness of the red LED, sometimes it lost tracking if not pointed directly at the webcam, producing an undesired rapid-clicking effect.
While idle the software reached 40% CPU + 12mB RAM Utilization if the display was active, and 30% CPU + 8mB RAM utilization if all drawing functions were turned off. While MS-Office applications continued unhindered, Photoshop would occasionally freeze up for fractions of a second, and Quake3's framerate dropped from low 30s to the low 20s.
This interface performed admirably - speed and accuracy were far beyond our modest expectations. But in the next version of this glove, we would likely make the following improvements:
Use brighter LEDs: not only would this increase the range with which the device could be used, but it would also allow us to reduce the brightness sensitivity, making the unit more usable in medium-light situations (we still doubt that it would be useful in the full brightness of Australian sunshine, however).
Use watch batteries mounted inside the rings: this idea was originally panned on the grounds that battery life would be reduced to a couple of hours at most, and watch batteries are expensive and not rechargable. But demonstration-level glove, this would be more visibly impressive and less intrusive. Currently the glove goes against our 'do not make the user look like a cyborg' goals.
Use a firewire camera: currently the USB connection is limiting the capture rate to 320x240@30hz with a medium-level of image quality. Firewire cameras currently on the market would allow 640x480@60hz, allowing smoother motion, higher image quality and increased accuracy and range, as well as allowing us to increase the size of the buffer zone, making it easier to detect the mouse-LEDs when the pointer is at the frame's edge. Firewire also places less strain on the processor than USB, hopefully leading to a decrease in CPU utilization.
In addition, the following possibilities should also be considered:
Use infrared LEDs: Most webcam CCDs are sensitive to infrared light, so if we used IR LEDs, we could put an IR-pass filter over the webcam to filter out the normal light. This would have issues with differentiation of colour, but could potentially be a low-power system, using IR-reflecting stickers for capturing pointer motion.
Build a head-band version: This would enable complete hands-free use of the mouse - clicking could be performed by eyebrow or jaw movement - this would make an ideal mouse-control system for the disabled. (naturalpoint are currently offering a similar product that uses foot pedals or dwell-clicking for mouse button emulation).
Use two cameras: Using binocular vision techniques, this could figure out the location of the user's hand in three dimensional space.
Most importantly however, we must consider escape from the mouse context - in Minority Report, we saw Tom Cruise sifting through a stream of consciousness using a pair of LED-gauged gloves in a manner most unfamiliar to most GUI users of today. And while the ergonomics of such a system can be debated, the fact that a user could continue to use everyday gestures to control a system is worthy of further study, especially in the realm of pervasive computing, where a user should not be expected to continually switch interface metaphors when controlling the real or the virtual.
Armed with more time and experience, we hope to build simple applications to illustrate this idea in the next B-series system.
Videos shakeycam1.mov - 13 seconds, 2mB
shows window dragging, gesturing and general latency.
shakeycam2.mov - 6 seconds, 1mB
shows window dragging, only download if you're REALLY BORED.
Photos (open in new window)
3 5-15mCD LEDs
2 AAA Batteries
Lots of wires
2 mini momentary switches
Velcro cable ties (for rings and wristband)
Dell Inspiron 4100 Laptop
Logitech QuickCam for Notebooks Pro