Bio-inspired vision mimetics toward next-generation collision-avoidance automation
The current "deep learning + large-scale data + strong supervised labeling" technology framework of collision avoidance for ground robots and aerial drones is becoming saturated. Its development gradually faces challenges from real open-scene applications, including small data, weak annotation, and cross-scene. Inspired by the neural structure and processes underlying human cognition (eg, human visual, auditory, and tactile systems) and the knowledge learned from daily driving tasks, a high-level cognitive system is developed for integrating collision sensing and collision avoidance. This bio-inspired cognitive approach has the advantages of good robustness, high self-adaptability, and low computation consumption in practical driving scenes. Vision is the primary sense for motion perception. It provides key information for navigation, course control, gaze stabilization, and other motion-related activities.1Fu Q. Hu C. Peng J. et al.A robust collision perception visual neural network with specific selectivity to darker objects.IEEE Trans. Cybern. 2020; 50: 5074-5088Crossref PubMed Scopus (18) Google Scholar Biological visual neural networks have evolved over millions of years and are working efficiently in nature. These visual neural networks can be ideal models for designing artificial vision systems, especially for collision avoidance. As a visual neural structure, the lobula giant movement detector (LGMD) neuron in locusts shows a strong ability to detect moving objects.2Yue S. Rind F.C. Collision detection in complex dynamic scenes using an LGMD-based visual neural network with feature enhancement.IEEE Trans. Neural Network. 2006; 17: 705-716Crossref PubMed Scopus (98) Google Scholar This LGMD neuron consists of a four-layer neural network in which the first layer detects the changes in luminance, the second and third layers process the interaction between excitation and inhibition responses, and the last layer outputs a spike (peak response). When the locust eyes perceive an approaching object, the LGMD neuron will be stimulated to output a spike signal to warn the locust of potential danger at distance. The high sensitivity of LGMD neurons in perceiving objects in motion3Xu J. Park S.H. Zhang X. A temporally irreversible visual attention model based on motion sensitive neurons.IEEE Trans. Ind. Inf. 2020; 16: 855-865Crossref Scopus (10) Google Scholar is suitable to promote vehicles to detect the approaching object. To take advantage of bio-inspired vision in collision detection, in this commentary, an innovative visual neural network that possesses similar collision selectivity of LGMD neurons is reported. The modeling of LGMD is shown in Figure 1A. The visual stimuli are perceived from the photoreceptor (P) layer to the excitation (E) and inhibition (I) layers, then the visual signals are decomposed into ON/OFF channels and fused separately in the summation (S) layer, and the grouping (G) layer is further aggregated for the output. The visual signals input to ON channels indicate the increased brightness in its navigation, and vice versa. When an obstacle approaches, visual stimulation will activate the LGMD neuron to detect potential collisions. The locust will then update its behavior to avoid collisions. This commentary discusses the impact of two human factors on driving: central fixation bias and binocular vision. So far, only a few studies have been conducted thoroughly on modeling human factors in driving and investigating its potential in ground robotics and aerial drones. Central fixation bias, which is a key factor in driving tasks as evidenced in Xu et al.,4Xu J. Guo K. Sun P.Z. Driving performance under violations of traffic rules: novice vs. experienced drivers.IEEE Trans. Intell. Veh. 2022; 7: 908-917Crossref Scopus (8) Google Scholar particularly for experienced drivers, is not thoroughly reflected in previous research. It is reported that human vision exerts a tendency of central fixation bias to actively search the gist of approaching scenes and bind visual features for higher cognitive understanding; that is, the visual information from the central visual field/region is paid more attention while that from outer regions is paid less attention.3Xu J. Park S.H. Zhang X. A temporally irreversible visual attention model based on motion sensitive neurons.IEEE Trans. Ind. Inf. 2020; 16: 855-865Crossref Scopus (10) Google Scholar Once potential hazards approach, the drivers' gaze or attention distribution exhibits high selectivity at the central visual area to enhance their alertness. To reflect this selectivity of gaze distribution in the obstacle detection process, our reported LGMD considers the effect of central fixation bias in receiving visual stimuli and transmitting the processed signals to the P layer. Specifically, the Gaussian filter shown in Figure 1B is incorporated into LGMD to keep the central region of the image as clear as in the original image while blurring the outer regions of the image. This in turn sharpens the sensitivity of LGMD to forward motion. Another human factor in driving, binocular vision, should also be fully considered. Compared with monocular vision, binocular vision could fuse depth measurement in navigation, which is conducive to accurate perception of obstacle distance. Different from most of the existing LGMD models1Fu Q. Hu C. Peng J. et al.A robust collision perception visual neural network with specific selectivity to darker objects.IEEE Trans. Cybern. 2020; 50: 5074-5088Crossref PubMed Scopus (18) Google Scholar,5Lei F. Peng Z. Liu M. et al.A robust visual system for looming cue detection against translating motion.IEEE Transact. Neural Networks Learn. Syst. 2022; Google Scholar that only verified the collision detection performance under monocular vision, our commentary provides the collision detection performance of LGMDs under binocular vision. Based on the above discussion on LGMD and the two human factors in driving, a collision avoidance strategy is reported, which is inspired by the visual-based obstacle avoidance mechanism of locusts and considers two human factors of central fixation bias and binocular vision. In this strategy, central fixation bias is modeled and placed in front of the P layer of LGMD. Two LGMDs, left and right LGMDs, are adapted to process the images of the left and right cameras in front of the vehicle, respectively. During the motion of the object, once the collision is detected by either the left or right LGMD, two procedures of direction selection and collision body distance prediction are triggered simultaneously. In the direction selection procedure, the values of normalized membrane potentials (NMPs) output by the left and right LGMD models are compared. The side with a larger NMP value is then selected. In the collision body distance prediction program, NMP and feedforward inhibition of the left and right LGMD models and the current robot motion speed are fed into the trained multi-layer perceptron model. Then, the model will output the predicted distance. Based on the direction selection and distance prediction, the best motor control is finally derived for optimizing the steering selection of collision avoidance under the emergent situation. LGMD is a promising bio-inspired model for collision avoidance. In addition to its superior performance, this kind of bio-inspired vision mimetics approach can reduce the dependence of algorithms on graphics processing units and central processing units, which becomes its unique dominance for next-generation automation. The next step of the bio-inspired vision metrics approach should focus on the processing of road scenes with complex backgrounds. On real roads, different weather and lighting conditions will lead to large variance in visual appearance of driving scenes, which is a difficult challenge that should be further studied. Besides, human factors, particularly in the attention process, are crucial for safe driving. Like human driving, after a quick scan through the entire visual field, LGMD tends to select a target area (ie, focus of attention) and then invests more attention resources to extract detailed information about the target and suppress irrelevant surrounding information, which is the primary foundation of situation awareness. This allows us to use limited attention resources to quickly screen out high-value targets from a large amount of information. As a survival mechanism evolved in human evolution, the visual attention mechanism considerably improves the efficiency of visual information processing. This key block would further enhance the performance of ground robots and aerial drones guided by human factors in driving. This work was supported in part by the EU's Seventh Framework Program (EU FP7) International Research Staff Exchange Scheme (IRSES) Project–Towards zero road Accidents—Nature Inspired Hazard Perception under grant 318907 and in part by the Zhejiang Provincial Natural Science Foundation under grant LY23F030001. The authors declare no competing interests.
Gary J.W. Xu, Kun Guo, Seop Hyeong Park et al. 2022Article