Neural Networks Pave the Road to Autonomous Driving
June 10, 2016 by John Day
By Jeff VanWashenova, Director, Automotive Segment Marketing, CEVA
Driver-assistance systems such as lane-departure warning, parking assist and collision avoidance have been welcomed by drivers who deal with the diverse hazards found on today’s roads. Current Advanced Driver Assistance Systems (ADAS) are capable of performing basic driving tasks autonomously, setting the scene for progress toward fully autonomous driving in the future.
The road to autonomous driving depends on development in three key technology areas. These are sensing, which includes radar, LiDAR and visual monitoring using cameras, high-definition mapping, and localization that accurately pinpoints the vehicle within its surroundings.
Vision processing is a key element of advances in sensing. Figure 1 shows how cameras now provide all-round vision and feed ADAS applications as listed in table 1. As the vehicle’s on-board systems take on more and more of the routine decision-making tasks, there is a growing need for efficient, low-power computing platforms capable of performing visual recognition with minimal latency. This is needed to enable the vehicle to identify pedestrians, road markings, and – as systems progress toward fully automated driving – to perform more complex tasks such as identifying and interpreting road signs.
Table 1. Key vision based ADAS Applications
The power consumed by vehicle electronics is coming under increasing scrutiny for a number of reasons, including the impact of the size, weight and cost associated with managing inefficient systems. Moreover, with increasing reliance on electric propulsion, in hybrids and full EVs, the power consumed by the vehicle’s on-board electrical systems will become ever more closely linked to limiting the driving range. As manufacturers look to introduce ADAS functionality across their higher volume vehicles, the need for cost effective solutions will increase the demand for more affordable and economical visual computing platforms.
For automotive vision applications to be effective, it is important to accurately identify key objects of concern such as pedestrians, vehicles, road markings and street signs. It is also important to overcome the harsh vision environment that is common in the automotive environment. Neural networks, which can be trained to recognize complex visual images with a high level of accuracy, promise a solution to the challenges experienced in automotive vision applications. In particular, Convolutional Neural Networks (CNN) containing multiple layers of trainable neurons have the ability to learn quickly and recognize efficiently. These techniques are already well proven in fields such as machine vision and speech recognition.
As CNNs are more widely used in automotive vision applications, it is important to understand the process and hardware platforms that are optimal to implement each phase of such systems. Implementation of CNNs can be broken down into three main phases: training, translation, and the execution of CNNs into a cost efficient production solution. The utilization of systems that are most advantageous for each phase is necessary to achieve a cost effective, efficient solution targeted for high volume vehicle applications.
Training is often done offline with the use of CPU based systems, graphics processors (GPU) or Field Programmable Gate Arrays (FPGAs). These systems are optimal for training because of their high computational attributes and familiarity among design professionals. However, there are limitations to these solutions due to their computational efficiency and high cost, which prohibits them from being used in high volume production systems. In the training phase, developers utilize frameworks such as Caffe to model and optimize CNNs. A reference image database is used to determine the optimum weights for neurons in the network. Upon completion of the training, the conventional approach is then to generate the network and prototype on CPUs, GPUs or an FPGA, typically executing floating-point arithmetic to ensure optimum precision.
A combination of high performance and low power consumption is essential if CNNs are to be deployed successfully in mass-market autonomous-driving systems. Nvidia has already demonstrated a computing platform for autonomous driving, based on deep learning implemented with Caffe and running on supercomputer-type system on chip (SoC) processors. However, for mass-market use, auto makers require a more affordable and lower-power solution that is better suited to embedded deployments.
Addressing these requirements, CEVA introduced the CEVA-XM4 imaging and vision DSP and developed a network generator to translate a trained network to run efficiently on the DSP. The CEVA Network Generator can take a trained network structure and weights, developed on a high-power floating-point CPU, GPU or hybrid platform, and convert it to a slim, customized model based on fixed-point arithmetic, that nicely fits the power and performance constraints of embedded platforms. Figure 2 illustrates the workflow, beginning with conventional network generation using Caffe and subsequently using the CEVA Network Generator to create the customized real-time network.
Figure 2. The compute-heavy floating-point CNN is converted to an efficient fixed-point equivalent capable of real-time performance within an embedded power budget.
The model created by the CEVA Network Generator can now be targeted to run on a low-power embedded platform running on the CEVA-XM4 imaging and vision DSP that is optimal for large scale production systems. The trade-off for bringing high-performance neural processing within the modest power budget of today’s production vehicles is only minimal degradation in image-recognition accuracy: in fact, less than 1% compared to the original network.
The converted network runs on the CEVA-XM4 using fully optimized CNN layers, software libraries and APIs. In addition, the CEVA-XM4 provides a number of capabilities that save power and enhance vision-recognition performance. These include an instruction set architecture that is optimized for conventional computer vision in addition to CNNs, a power-scaling unit that allows dynamic voltage scaling, and features such as auto-fetching and local data reuse that minimize the power consumed when moving data or interfacing with memory.
To illustrate the performance capabilities of a CEVA-XM4 based system, CEVA has implemented a 24-layer CNN with 224×224 input size and 11×11, 5×5 and 3×3 convolution filters, hosted on the CEVA-XM4 DSP. This CNN has proved capable of delivering almost three times the performance of a comparable CNN implemented in a typical hybrid GPU/CPU processing engine, while requiring only about one-fifth of the memory bandwidth and a significant reduction in power consumption.
A complete supporting development platform is also available, including a hardware development kit and software toolset, an application development kit that includes the CEVA Deep Neural Network Framework (CDNN), and vision and ADAS software products with source code provided.
In summary, deep learning and neural networks have demonstrated their potential to open up the road ahead for autonomous driving. By delivering an economical, low-power solution, CEVA’s platform for generating real-time convolutional neural networks now brings affordable, efficient systems within reach for the mass-market.