An Introduction to 3D Gesture Recognition Systems

The market for gesture recognition technologies is estimated to increase by 22.2% CAGR to evolve into a $30.6 billion global industry by 2025.1

The automotive industry is specifically helping to enhance this growth, due to the possible advantages of gesture-controlled systems to optimize the user experience, increase vehicle safety, and decrease the workload of the driver.

Gestures are an intuitive and natural element of human expression and communication.

The ability to use gestures to communicate with nearby devices involves a very small amount of additional intellectual processing. It can be done almost without thought.

One advantage of utilizing gestures while driving is that gesturing takes almost no attention away from other activities. In contrast, interacting with a touch screen display means that drivers must avert their attention from the road in order to make accurate, on-screen selections.

BMW’s Series 7 introduced gesture recognition capabilities in 2016. Drivers can “turn up or turn down the volume, accept or reject a phone call, and change the angle of the multi camera view. There's even a customizable two-finger gesture that you can program”2 to whatever command you want—from “navigate home” to “order a pizza”.

Continental, an automotive component manufacturer, now provides gesture-recognition-capable systems integrated with the instrument cluster, steering wheel, and center display, allowing drivers to manage functions with only a motion of their fingers or swipe of their hand.

Similar to many gesture recognition devices, these systems employ near-infrared (NIR) light in the 850-940 nm range, implementing structured time of flight (TOF) and light techniques.

Aside from the automobile industry, other encouraging applications of gesture recognition include manufacturing, sign-language recognition, assistive technology for people with impaired mobility, and in healthcare, where it could be used to provide contactless and sterile computer interfacing in an operating room for example.

Gesture Acquisition Technologies

Gesture recognition relies upon the integration of several elements. The first stage is to ‘acquire’ the gesture; to capture human movement in a way that it can be processed.

Gesture acquisition can be achieved utilizing systems that are device-based (A glove controller worn by a user for example) or systems that are vision-based which use a particular type of camera.

Visual input systems can employ various different technologies, including 3D or depth sensing, thermal imaging or RGB.

The field of computerized hand-gesture recognition was launched in the early 1980s with the creation of wired gloves with integrated sensors on the finger joints, known as data gloves. At the same time, visual image-based recognition systems were being developed that were based on reading color panels connected to gloves.3

Examples of acquisition devices for gesture recognition: (a) left: mobile phone with GPD and accelerometer, right: inertial sensors with accelerometer and gyroscope are attached to a suit worn by Andy Serkis to create the CGI character of Gollum in The Lord of the Rings movies, (b) Google Glass for “egocentric” computing, (c) thermal imagery for action recognition, (d) audio-RGB-depth device (e) active glove and (f) passive glove.4

Examples of acquisition devices for gesture recognition: (a) left: mobile phone with GPD and accelerometer, right: inertial sensors with accelerometer and gyroscope are attached to a suit worn by Andy Serkis to create the CGI character of Gollum in The Lord of the Rings movies, (b) Google Glass for “egocentric” computing, (c) thermal imagery for action recognition, (d) audio-RGB-depth device (e) active glove and (f) passive glove.4

The Microsoft Kinect, a motion sensor add-on for the Xbox® gaming system, was the first mass market product based on gesture recognition.

It utilized an RGB-color, VGA video camera, a multi-array microphone, and a depth sensor to acquire and react to the actions of players. The motion sensing platform continues to be available to developers under the name Azure Kinect although the consumer product has phased out.

Controller-based or wired gesture capture systems are still commonly used, but interest in ‘touchless’ technology is increasing.1

In environments like hospital operating rooms or in the driver’s seat of a vehicle, there are significant advantages to a device that does not require touch.

Visual Image Acquisition, Step by Step

Gesture recognition is essentially the mathematical representation of human gestures through computing devices. A sophisticated sequence of processes is necessary in order to acquire, interpret, and react to human input. Marxent Labs describes four important steps:

Step 1. A camera acquires image data and feeds it into a sensing device that is connected to a computer.

Step 2. Software that is specially designed discerns meaningful gestures from a predetermined gesture library where each gesture is paired with a computer command.

Step 3. The software then correlates each live gesture, interprets the gesture, and employs the library in order to recognize meaningful gestures that match the library.

Step 4. The computer performs the command related to that particular gesture once it has been interpreted.

Lumentum outlines the main element of the imaging system to be utilized in Step 1:

Illumination source – Laser diodes or LEDs normally generate infrared or near-infrared light. "This light isn’t normally noticeable to users and is often optically modulated to improve the resolution performance of the system.5

Certain systems utilize a 2D color camera along with a 3D sensing (NIR) light source and a camera.

Controlling optics —Optical lenses enhance the illumination of the environment and direct reflected light onto the surface of the detector. A bandpass filter allows only reflected light that is equal to the illuminating light frequency (such as 940 nm) to reach the light sensor, negating ambient and alternative stray light that would decrease performance.

Depth camera — A high performance optical receiver identifies the filtered, reflected NIR light, transforming it into an electrical signal for processing.

Firmware — Ultra high-speed ASIC or DSP chips (also known as gesture recognition Integrated Chips, or ICs) process the information received and change it into a format that can be recognized by the end-user application, for example video game software.

Image Processing Complexity

Acquiring the images for analysis (Step 1) is possibly the most simple element of making gesture recognition systems operational, even with all the system components involved.

The far bigger challenge is the interpretation of the information into a workable human-computer interaction model (Steps 2 and 3).

Facial recognition simply matches a static captured pattern to a stored static pattern, but gesture recognition needs a complicated evaluation of dynamic movement.

For the purposes of analysis, hand gestures can be divided into several elements: the hand configuration (shape of the hand), its movement through space, its orientation, and its location (position).

Gestures can also be broadly categorized into dynamic (where the hand posture changes as it moves) and static (where the hand holds a single posture).

Skeletal-type hand gesture recognition images using Google’s open source developer algorithm, which provides real-time gesture recognition tools using a smartphone. (Image Source: Mashable)

Skeletal-type hand gesture recognition images using Google’s open source developer algorithm, which provides real-time gesture recognition tools using a smartphone.

The evaluation, interpretation, and categorization of various human gestures is a complicated, interdisciplinary project, which combines elements of computer vision and graphics, bioinformatics, machine learning techniques, pattern recognition, motion analysis, motion modeling, image processing, and even psycholinguistic analysis.6,7

Various approaches have been tested and combined to create workable computerized gesture classification models, such as dynamic time warping (DTW), deep neural network (DNN), hidden Markov model (HMM), support vector machine (SVM), time delay neural network (TDNN), and neural network (NN), among others.8

Gesture Recognition in Use

Companies are producing effective hand-gesture recognition systems despite the challenges mentioned, that are already being utilized today in industries from medicine to gaming.

Microsoft collaborated with Novartis, for example, in order to create a pioneering system for evaluating progressive functioning in multiple sclerosis patients. In virtual reality systems, gesture and hand tracking allows users to make contact with virtual objects.

Leap Motion makes a sensor that detects hand and finger motions. Besides using it to control your PC, it also performs hand tracking in virtual reality, allowing users to interact with virtual objects.

Powered by Near-Infrared (NIR) Sensing

Many of the effective hand gesture recognition technology systems used today are based on NIR light, which is not visible to humans, in order to reveal the human user’s motion.

NIR light supports 3D sensing and depth measurement functions that utilize structured light and/or TOF approaches to produce input data.

NIR light sources (commonly LED or laser emitters such as VCSELs) must function correctly to guarantee the accuracy of NIR-based gesture recognition systems.

Manufacturers of hand-gesture systems must prove its performance to ensure that the sources of illumination are emitting NIR light at an intensity that is adequate for the application to be successful and is safe for human exposure.

Radiant offers an NIR Intensity Lens solution for the correct characterization and measurement of NIR emitters, for example those employed in facial recognition, gesture recognition, and eye tracking applications.

The lens used in combination with a ProMetric® Imaging Photometer and TrueTest™ software, is a total solution for the measurement of NIR light sources.

For gesture and facial recognition applications, the TT-NIRI™ software module of TrueTest comprises of particular tests for NIR light source measurement such as:

  • POI Total Power
  • Total Flux (mW or W)
  • Pixel Solid Angle
  • Max Power
  • Flood Source Analysis
  • Dot Source Analysis
  • Image Export
  • Points of Interest

Both gesture and facial recognition systems utilize the same 3D sensing techniques of TOF measurement and structured light (dot patterns).

To read more about the testing of NIR emissions of sources utilized in human-centered technology, read the white paper: Measuring Near-Infrared (NIR) Light Sources for Effective 3D Facial Recognition.

Acknowledgments

Produced from materials originally authored by Anne Corning from Radiant Vision Systems.

References and Further Reading

  1. Gesture Recognition Market Size, Share & Trends Analysis Report By Technology (Touch-based, Touchless), By Industry (Automotive, Consumer Electronics, Healthcare), and Segment Forecasts, 2019-2015, Grand View Research, January 2019.
  2. “Wendorf, M., “How Gesture Recognition Will Change Our Relationship With Tech Devices”, Interesting Engineering, March 31, 2019.
  3. “Historical Development of Hand Gesture Recognition”, Chapt. 2 in Premaratne, P., Human Computer Interaction Using Hand Gestures. Cognitive Science and Technology, Springer Science+Business Media Singapore 2014. DOI 10.1007/978-981-4585-69-9_2.
  4. Escalera, S., Athitsos, V., and Guyon, I., “Challenges in multimodal gesture recognition”, Journal of Machine Learning Research, Vol 17 (2016), pages 1-54.
  5. 3D Sensing/Gesture Recognition and Lumentum, White Paper published on Lumentum.com, 2016.
  6. Wu, Y., and Huang, T., “Vision-Based Gesture Recognition: A Review”.
  7. Sarkar, A., Sanyal, G. and Majumder, S., “Hand Gesture Recognition Systems: A Survey”, International Journal of Computer Applications (0975 – 8887), Volume 71, No. 15, May 2013.
  8. “Cicirelli, G. and D’Orazio, T., “Gesture Recognition by Using Depth Data: Comparison of Different Methodologies”, Motion Tracking and Gesture Recognition (Gonzalez, C. Editor), July 12, 2017. DOI: 10.5772/68118

This information has been sourced, reviewed and adapted from materials provided by Radiant Vision Systems.

For more information on this source, please visit Radiant Vision Systems.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Radiant Vision Systems. (2019, October 29). An Introduction to 3D Gesture Recognition Systems. AZoSensors. Retrieved on November 21, 2019 from https://www.azosensors.com/article.aspx?ArticleID=1811.

  • MLA

    Radiant Vision Systems. "An Introduction to 3D Gesture Recognition Systems". AZoSensors. 21 November 2019. <https://www.azosensors.com/article.aspx?ArticleID=1811>.

  • Chicago

    Radiant Vision Systems. "An Introduction to 3D Gesture Recognition Systems". AZoSensors. https://www.azosensors.com/article.aspx?ArticleID=1811. (accessed November 21, 2019).

  • Harvard

    Radiant Vision Systems. 2019. An Introduction to 3D Gesture Recognition Systems. AZoSensors, viewed 21 November 2019, https://www.azosensors.com/article.aspx?ArticleID=1811.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this article?

Leave your feedback
Submit