An Introduction and Overview of a Gesture Recognition System Implemented for Human Computer Interaction


Abstract

The following paper develops the theories and methods used in a gesture recognition system implemented as a human-computer interface (HCI). The gesture recognition system recognizes four fundamental static hand gestures and variations for a total of nine gestures. The system uses a variation of the CAMSHIFT algorithm for hand tracking and a minimum distance classifier for classification. The Win32 API is utilized to perform the desired actions, which are determined by a microstate/macrostate architecture by which contextual information is used to correct any falsities in single frame classification. The state oriented model uses order statistics to provide corrections. The software, named MTrack, is implemented using Borland Delphi 7.0 and DirectX and is specifically designed for low-end desktop hardware. E.g. A 600MHz Pentium with commercial off the shelf (COTS) camera hardware.

News

3-15: Added a sample video of MTrack.
3-22: Added two new videos, updated the screenshots, and updated the feature list.
4-11: Thesis research complete. Code cleaning begins. EXE file is available for downloading.
4-16: Thesis application complete. Added source code to the downloads section.  Documentation begins.
6-10: Added a presentation on my project to the downloads section.  The data on this site is getting a bit stale ans I am not interested in updating it because I am working on writing my thesis.  My thesis will be available online Auguest 6.
7-8: As versions of my thesis are finished, I will post the newest version online. The versions will be dated.
7-14: Added another version of the thesis.
7-20: Thesis is finishing.  Page will be updated soon! In the meantime, check out the links.


Downloads


Introduction

This paper introduces the motivation, theory, implementation, and applications for a hand gesture tracking and translations system.  Furthermore, system architecture and implementation details are presented.  The algorithms, implemented in Borland Delphi 7.0, are explained in a manner independent of any programming language.  The goal of the explanations is to allow any programmer to understand the algorithms to a degree that allows them to implement them in any language, including C, C++, and Java.  However, the implementation of the translation of hand gestures to mouse actions will focus on the interface of the Win32 Application Programming Interface (API). The concepts will be explained in such a way to allow adaptation to any operating system including Linux.  Partial source code for the software is provided in the appendix along with a CD containing a demonstration.

            The software, named MTrack, is designed to run in any Microsoft Windows environment featuring compatibility with the Win32 API and supporting DirectX 9.0.  It can be interfaced with most Universal Serial Bus (USB) camera devices including the popular Logitech QuickCam.  The software was designed to run on a 600 megahertz (MHz) Pentium a minimum frame processing of fifteen frames per second (FPS).


Features

The current list of features to be supported is as follows:

  • Adjust tracking parameters.
  • Distinguish between the following hand gestures and translate them to mouse actions:
    • fist - mouse down
    • open hand, open fingers- mouse up
    • open hand angled > 20 degrees - middle button scroll up/down
    • open hand, closed fingers
    • 1 finger up - ALT-TAB
  • On Screen Display (OSD).
  • Save / retrieve tracker parameters to / from XML file.
  • Ability to select video source.
  • Save current rendered video frame to disk.

 

 

 

 


Author

Isaac Gerg
isaacgerg@psu.edu