How is facial monitoring performed in motor and cognitive performance modification using a visual-haptic interface?

Updated: Aug 20, 2019
  • Author: Morris Steffin, MD; Chief Editor: Jonathan P Miller, MD  more...
  • Print
Answer

Answer

The basic approach to facial monitoring is demonstrated below.

Video-to-scalar method applied to eye movement (pr Video-to-scalar method applied to eye movement (profile view). A. Single eye opening and closing on command. Upper trace shows eyebrow region movement; lower trace shows movements in the region of palpebral fissure. B. As in A, except closure precedes opening. C. Series of 2 opening-closing cycles on command (square wave). In each case, raw video is shown at right, processed video region at left. Eye position can be observed in the raw video corresponding to the scalar signals as marked.

The eye region is analyzed in real time, including the supraorbital region and the palpebral fissure. The graphs represent scalar values corresponding to the positions of the structures in the corresponding videospace. Spatial and time resolution are good, as is evident the image above.

The same approach is demonstrated in the image below for the mouth region.

Mouth analysis using video-to-scalar method. Mouth Mouth analysis using video-to-scalar method. Mouth opening (A) and closing (B) on command (compare with physiologic yawn). Mouth position at the corresponding scalar points can be observed in the raw video. C. Series of 2 open-close cycles.

Oral and chin movements are displayed in separate channels. With mouth opening and closing, spatial and time resolution of the movements are similar to those for the eye region. In this case, the mouth movements occurred on command and are therefore more rapid (square wave) than would occur with physiologic yawning; differentiation between volitional and subcortical processes such as yawning is clear with this method, as is shown below. [12]

Physiologic yawn. Mouth region of interest (ROI). Physiologic yawn. Mouth region of interest (ROI). Four scalar channels derived from subregions (SR) 1-4 as labeled. Note the much more gradual onset and decay, nearly sinusoidal rather than rectangular, with greater low- to mid-frequency noise due to changes in muscle tension and, therefore, mouth configuration.

With the physiologic yawn, the graphs show much more gradual configurational changes of the mouth, almost sinusoidal rather than rectangular. Preservation of high-frequency response is thus necessary for rapid system discrimination of and response to volitional facial driving responses.

Increased spatial resolution can be achieved by multiple channel sampling of overlapping regions, as shown below.

Multichannel correlation of mouth region configura Multichannel correlation of mouth region configuration during movement, cessation of movement, and resumption of movement, as labeled. Note the flat baseline in all channels once complete cessation of movement occurs and the abrupt return of movement in all channels with resumption of movement.

Here, periods of active oral movement contrast with a period of cessation of mouth movements. Reliability of the data is increased by interchannel correlation, as can be seen in these traces during the cessation phase by inspection. Again, the waveforms demonstrate the feasibility of scalar analysis. To resolve behavioral changes in the patient, the video-to-scalar approach presented here is much more efficient computationally than, for example, would be convolutional video transform analysis.

An example of conscious, but quiescent facies, as opposed to volitional activity, involving both mouth and eye movements is demonstrated below.

Relaxed (quiescent) facies. Note the lower amplitu Relaxed (quiescent) facies. Note the lower amplitude, higher frequency signals in the eye channels, also with greater baseline drift in the mouth channels.

Eye and mouth movements (2 channels each) are monitored simultaneously. Eye movements are characterized by lower-amplitude, higher-frequency components than mouth movements. As seen here and in images above, mouth movements also show more baseline drift and other low-frequency noise, making interpretation more difficult, although the uncertainty caused by such drift is considerably reduced by the multichannel sampling shown above. However, further improvement in reliability is achieved by high-pass digital filtering, as demonstrated below. In this case, the baseline during movement cessation is nearly flat, leading to less ambiguity and greater reliability in behavioral assessment.

Effect of high-pass digital filtering. Mouth and e Effect of high-pass digital filtering. Mouth and eye activity during talking with period of cessation of talking. Note flat, nearly noise-free baseline during cessation of movement, generally decreased baseline drift, and greater resolution of movement components.

By adding an asymmetrical exponential decay to the output of the high-pass filter, as shown below, a time delay can be introduced to assess consistency of the signal change as it may reflect a behaviorally significant event.

Addition of asymmetrical exponential decay after h Addition of asymmetrical exponential decay after high-pass filter, 4 mouth channels. With cessation of movement, signal decay is exponential. If cessation is longer, signal declines to trigger level (labeled "Alarm trigger," red marker). Signal instantaneously increases (no delay) when movement resumes ("Reset alarm trigger," green marker).

When activity ceases, the signal level decays exponentially until it reaches a level that can trigger a response from the system. As soon as activity resumes, the trigger is reset. In this case, correlation among 4 mouth channels determines response triggering.

Another correlation method involves a similar approach, but with monitoring of 2 mouth and 2 eye channels, as shown below. In the middle of the sweep, both mouth and eye activity cease long enough to produce a combined trigger effect, while at the end of the sweep only the mouth activity ceases long enough for the triggering effect.

Filter technique applied to eye and mouth images ( Filter technique applied to eye and mouth images (each 2 channels). With complete cessation of facial movements, both eye and mouth signals decrement, resulting in "Combined Eye and Mouth Trigger, red marker. When movements in both regions resume, both triggers are reset. Later in the sweep, mouth movements cease while eye movements continue; only the mouth trigger is set ("Mouth Alarm Trigger," red marker), then reset when mouth movements resume ("Reset Mouth Trigger," green marker).

These combinations of approaches allow for a wide variety of machine responses to behaviorally significant facial activity. Because the algorithms are efficient and can run on a stand-alone system, preferably a video digital signal processor board, major computer resources are still left free for artificial intelligence routines to effect interpretation of and response to the patient activity indicated by these scalar signals.

Development is continuing to enhance interpretation of these video-derived scalar responses to integrate patient facial activity in machine response paradigms. The potential exists for faster, more efficient response with this technique compared with voice recognition or EEG control of robotics. A combination of all of these signal modalities (eg, video, electrical, verbal) will likely ultimately be used to generate assistive responses for severely disabled patients. Initial indications suggest that machine-level video facial interpretation will play a prominent role in the design of assistive robotics for patients with severe motor impairments. Such a result would indeed represent a cooperative robot, attentive to nonverbal and verbal cues.


Did this answer your question?
Additional feedback? (Optional)
Thank you for your feedback!