This article will explore the challenges in developing a hands-free voice-operated remote control. This would mean replacing current push to talk (PTT) solutions with a system which is always listening, one that uses a capacitive MEMS microphone which can be used for voice-operated controls.
Conventional always listening technologies are always listening for sounds in the human frequency range, which are based on the speech activity level (large dashed box, Figure 2). Once such a frequency is heard the rest of the system is woken via a wake word detection engine, provided that the wake word (for example, “Alexa” for Amazon’s system) is spoken, which is then transmitted using a low energy transport method such as Zigbee or Bluetooth.
Figure 2: Flow chart for wake word detection using Wake on Sound mode.
In such a system the front-end microphone will always be running on standby at an average current of 200 μA for capacitive MEMS systems. This whole process, including voice activity detection (VAD), detection of the wake command and waking up in response, means the system’s standby power is in the milliamps range. In addition, often an additional microphone is required to listen over a large area which can consume up to twice as much power.
The power consumption of the system used for the detection of wake commands represents a bottleneck for the design of a low powered system. In comparison the remote’s low power transport system (e.g. low power Bluetooth) needs only 0.1 μA to run on standby.
In contrast, PTT systems are only active once they have been switched on to listen mode by the user, meaning they consume significantly less power for command recognition.
For a far-field remote to operate with a battery life comparable to push to talk solution, an always-on always listening system is required, with an ultra-low power wake word detection at the front end followed by a system on chip solution that is on par with the front-end voice interface.
Vesper’s ZPL™ is a perfect solution for the wake word detection system with a simple, yet powerful wakeup system at the front end.
Vesper’s ZPL™ microphones represent a novel, ultra-low power solution for always-on listening systems. ZPL™ microphones function by always monitoring their surrounding noise environment to recognize specific sound frequencies above a threshold noise level.
When the ZPL™ microphone is in this Wake on Sound (WoS) mode the microphone consumes significantly less power than other always on solutions, at only 10 μA of current, and actively listens for voice frequencies within the noise range of 65 – 89 dBSPL(A). When the ZPL™ hears a sound within this noise and frequency range the system returns to its normal operation and picks up on all audio, with this stage using 85 μA of current.
As shown in Figure 2 the VM1010 microphone’s WoS mode behaves as an acoustic watchdog, which listens out for relevant sounds and wakes up the rest of the system when required. Only one VM1010 microphone is needed to wake up the full microphone array for command recognition. This means that ZPL™ integrated in VM1010 can provide significant power savings when compared to all other conventional always listening technologies.
How ZPL™ Works
All of Vesper’s piezoelectric MEMS microphones convert the mechanical energy present in a sound wave into the electrical power that drives the system. Following the impact of a soundwave on the microphone’s piezoelectric MEMS element the system moves in response, and this movement generates a potential which is recognized by a very low power current comparator circuit. This circuit then sends a wake signal to the processor if the potential is high enough to indicate a level of noise above the threshold.
The threshold range of 65-89 dBSPL for the ZPL™ is achieved by the careful choice of a resistor which optimizes the microphone performance in relation to its environment. The system also uses a bandpass filter which selects for sounds between 250 Hz to 6 kHz, i.e. the human vocal range, meaning false positives from HVAC systems, wind and other external noises are ignored.
The standard WoS noise threshold is 65 dBSPL and this can be modified by the connection of a resistor between the GA1 and GA2 pins (Figure 3), with a smaller resistor between the pins resulting in a greater gain in the instrumentation amplifier. The GA1 and GA2 pins allow the feedback network for the instrumentation amplifier in the WoS system to be modified.
Figure 3: Fixed adjusted WoS threshold, implemented with external resistor (Rg) between GA1 and GA2 pins.
ZPL™ Design Parameters
Vesper’s ZPL™ microphone uses a peak detection method to activate the microphone towards ambient sounds within the noise range of 65 – 89 dBSPL. This wide noise range allows different developers to fine tune the microphone’s listening system for use in different environments. ZPL™ can either be activated using a wake command or upon hearing a different event, meaning the choice of the noise threshold can be modified for different applications.
For systems activated using a wake command the choice of a noise threshold should be based on the speech volume of the user, the distance from the device that the user will be, and the noise of the user relative to environmental background noise.
For example, for a system such as a security camera a higher noise threshold should be used to provide the best lifetime as the outdoor environment is inherently noisier. Following selection of a WoS threshold the rest of the device can be configured using an external resistor.
The hold time of the ZPL™ microphone is the time period following waking for which the microphone actively listens before returning to WoS mode. The hold time needs to be set to provide the best lifetime for an application.
When the hold time is shorter the battery life is extended as the period for which the microphone actively listens is reduced. However, long hold times can mean that the microphone is already actively listening upon speaking the wake command.
Advantages of ZPL™ in Handsfree Voice Remote
The battery savings of ZPL™ are significantly better than always listening voice activated systems.
Voice activated systems must always be listening for voice activity throughout the entire day. This is highly inefficient as it means the system is listening for large time periods over which there is no activity – for example, if the system is in a living room there will be more than 8 hours when the occupants are sleeping, and more time whilst they are at work.
Figure 4 shows the activity over 24 hours for a WoS microphone set up in a living room at a noise threshold of 78 dBSPL, with the time (starting at midnight) on the x-axis and the system’s fully on state indicated by a 1. Whilst the graph only shows data from one living room over the course of one day it is evident that the system is only used during certain periods of the day and the microphone is asleep most of the time.
Figure 4: Logged data from VM1010 (x-axis shows time in a 24-hr period).
In the case of a TV remote this means that the rest of the system, including the voice processor, the main processor and the A/D convertor can all be asleep most of the day. This selective waking, that is possible using a WoS system, saves a significant volume of standby power when compared to an always listening voice-activated system.
Download the Full White Paper Here
This information has been sourced, reviewed and adapted from materials provided by Vesper Technologies, Inc.
For more information on this source, please visit Vesper Technologies, Inc.