On Path planning for self-driving cars and its ties with the perception system

The world of self-driving cars is rapidly evolving. Newer and more sophisticated motion planning, path planning and control techniques for autonomous vehicles are coming up and the sheer variety of approaches and methodologies in this field is a testament to the boundless possibilities that it offers. There exist classical algorithms rooted in computer science and machine-learning-based strategies, and rule-based systems to neural network-based decision-making systems.

Let's take a deeper look into these techniques and understand their purpose and get an intuitive understanding of how they function.

An overview of various aspects of self-driving cars and their market growth and various competitions held globally aimed at autonomous vehicle improvement:

A self-driving car is essentially a conventional car, but it consists of a sensor suite, onboard computers, and software systems that enable it to drive autonomously in various conventional and unconventional and dynamic road and traffic situations. It makes use of various localization and path-planning algorithms to produce appropriate trajectories and then uses control algorithms to guide itself along these trajectories.

With advancements in AI, almost every software system within a self-driving car now makes use of artificial intelligence. Ranging from intelligent control systems, and enhanced object detection software making use of deep learning, a self-driving car makes use of state-of-the-art algorithms to gain a clear understanding of its environment, develop SLAM maps and make safer than ever decisions so as to which path/trajectory to follow, when to accelerate and when to brake and predict the trajectories of surrounding vehicles.

Companies developing and/or testing autonomous cars include Audi, BMW, Ford, Google, General Motors, Tesla, Volkswagen, Waymo, Cruise and Volvo. Google's test involved a fleet of self-driving cars — including Toyota Prii and an Audi TT — navigating over 140,000 miles of California streets and highways.

These improvements in the technology and road infrastructure, and industrial participation are prompting the expansion of the production of autonomous vehicles as is clear from the image below which gives a description of the situation with the North American market.

Looking at the global scenario, the global autonomous vehicles market demand was estimated at 51.6 thousand units in 2021 and is expected to expand at a compound annual growth rate (CAGR) of 53.6% from 2022 to 2030.

One of the most famous global competitions, aimed at improving driverless technology is the DARPA Grand Challenge. This event was first held in 2004 was funded by the Defense Advanced Research Projects Agency and was created to promote the development of technologies aimed at developing the first fully autonomous ground vehicle.

There is also the Indy Autonomous Challenge, which originated very recently as the successor of the DARPA Grand Challenge. This challenge was started to provide a challenging environment for the development of autonomous vehicles as the teams have to develop software for autonomous driving in a highly dynamic and risky environment of a racetrack.

All in all, self-driving technology is disruptive and will revolutionize transportation in the coming years, as through research, development of a fully autonomous and deployable level-5 car is neared.

Let's dive into the perception system within a self driving car:

Now that we have gained of overview of what a self driving car is, Let's look at the perception system within a self-driving car which forms its backbone, as it is through the perception system that all the software system within the AV recieve data and then produce meaningful outputs.

The perception system can vary from car to car, and the setting in which it must drive. Sensors like cameras, LIDAR, radar, and cameras, feed data points, images and cloud maps to software which then creates detailed surrounding maps (via techniques like V-SLAM). Through these maps the autonomous car analyses the surrounding environment in three-dimensional space which helps it in understanding surrounding locations and objects such as highways, barriers, traffic signals, pedestrians and other landmarks.

Different sensors within perception stack of an autonomous vehicle it possess individual strengths and weaknesses and therefore a fusion of signals from these different sensors would facilitate a higher detection quality, which is another explorable topic dealing with perception systems of self-driving cars, known as sensor fusion.

An important component of the perception system of a self-driving car the is software system (a sensor fusion system in itself is an example of such a system) that takes the data collected by the sensors and uses it determine appropriate trajectories and control commands. The perception software system of a self-driving car can be further broken down into sub-categories and in itself is a different area of study, but lets focus on the perception software system which makes use of computer vision.

This system may take in images and videos from the onboard camera sensors as well as point cloud data from LIDARs, makes this data flow through a series of steps which leads to the production of performable actions which can range from generation of useful trajectories, to actions that are needed to be taken in emergency situations. This system can be broadly broken down into the detection part, the classification part, the tracking part and the segmentation part. Each has its own unique role and are described further

1. Detection

If we look at detection from the point of view computer vision, object detection, a very common term, comes up. Object detection can be performed by either traditional, image processing techniques or modern deep learning networks. Within the deep learning networks based approach, Convolutional Neural Networks find their application in object detection.

Through research advanced variants of CNN structures have been developed for the purpose of object detection. Examples of such structures which are widely used include the Mask-R-CNN, YOLOv7 and RetinaNet.

Object detection techniques can be employed to detect both static and dynamic objects (objects that move across frames). This idea can be further extended to differentiate between between agents (pedestrians/vehicles) within a scene that are static and dynamic accross multuple, consecutive frames, so as to be able to predict their trajectories.

As far as static objects go, a common and a basic example is detecting traffic lights. Here, a CNN architecture can be trained on a large image dataset consisting of traffic lights and by considering annotations and ground truth data would eventually overtime learn to localize the traffic light in the image. Here is a tutorial that looks into this problem and provides code as well.

2. Classification

Again, classification can be discussed with respect to computer vision and object detection. What follows after detection of different objects within images or videos is the task of classifying them into categories based on their types, sizes and their distance from the AV. Classification through deep learning requires thoroughly annotated datasets which should be diverse and representative of the real-world scenarios the model will encounter.

The behavior of an autonomous vehicle is influenced by the output of the classification system. These outputs help in the development of real time maps of the surroundings of the AV (V-SLAM) which provides crucial information for making driving descisions and planning safe trajectories. Making driving decisions and planning safe trajectories can be brought about through predictive analysis through which trajectories of the various agents (vehicles/pedestrians) within the same scene as that of the AV are predicted. These predicted trajectories therefore aid the AV with its decision making process.

Therefore, another important aspect of developing classification algorithms is the precision with which they classify objects. Objects should be given the right labels so that they get assigned with the right predictive model

3. Tracking

Tracking objects becomes a critical task when it comes to scenario of autonomous vehicles. Tracking objects again within computer vision involves locating and following objects of interest in a sequence of images or video frames. It enables self driving cars to follow different agents within its scenario which further enables the vehicle to make informed decisions about speed, lane changes and braking.

Tracking objects is different from predictive analysis in the sense that through tracking real time locations of the agent can be considered which is different from generating predictions about its trajectories although both the tasks have their own importance.

Through object tracking a major challenge in self-driving is dealt with, that is, the occlusion event. This occurs when the object of interest becomes obscured (partially or fully) or gets hidden behind another object. In the self driving car scenario where the vehicle is driving in a dense and uncertain environment, an occlusion event is bound to happen.

One way in which object tracking becomes useful for tackling the occlusion event is because that object tracking algorithms have the ability to maintain temporal consistency. They maintain the continuity of an object's identity across frames in a video or sequence of images. When an object becomes occluded and then reappears, the tracking algorithm can help associate the occluded object with its previous state based on motion, appearance, or other features.

This leads to a scenario where the object that just reappered after being occluded can tracked again.

4. Segmentation

Segmentation refers to a task where each pixel within an image is classified into a category. Again within computer vision, CNNs, deep learning and machine learning techniques are used to develop algorithms that can carry out this task. An example of where segmentation is used is to differentiate between drivable and non-drivable surfaces. These differences can be further put to use within decision making algorithms.

Segmentation helps in the understanding of the environment at the most detailed level as each pixel is labelled. Through segmentation different parts of the scene and their relationships are recognized. This helps computer vision systems to build a richer understanding of the environment.

Another step after segmentation is Semantic segmentation. Semantic segmentation goes a step further by assigning semantic labels to each segmented region. For example, it can distinguish between road, vehicles, pedestrians, and buildings in a street scene. This semantic understanding is critical for applications like self-driving cars, where the system needs to make decisions based on the type of objects in its surroundings.

(Note that there also exists the algorithmic part of the perception system such as localization algorithms or V-SLAM that are further used for localization.)

Tying down the output of the perception system with path planners

Once the perception and scene understanding is brought about, through algorithms like V-SLAM, and map creation and localization is brought about, the next step, if things are looked at broadly is path planning. The idea is to represent the understanding of the scene in a suitable data format. This representation typically includes the vehicle's position and orientation (pose), the locations and characteristics of objects in the scene, and information about the road network and traffic rules.

The representation of the scene, along with the vehicle's current pose, is provided to a high-level path planning module. The path planning algorithm takes the scene representation and vehicle's current state as input and generate appropriate trajectories or paths for the vehicle.

These algorithms aim to determine how the vehicle should navigate through the environment to achieve its goals while avoiding obstacles, following traffic rules, and optimizing for safety and efficiency. Once a high-level trajectory or path is generated, it's refined into a lower-level trajectory that can be executed by the vehicle's control system.

The control system translates the planned trajectory into specific actions, such as steering, acceleration, and braking, to follow the planned path while accounting for vehicle dynamics and actuator limitations. This is coupled with a feedback system as the car keeps on sensing its real time environment and provides feedback to the control system which may then adjust trajectory or apply brakes based on real-time threats.

Let's go deeper into path planning:

Let's take a slightly deeper look at path planning is for self-driving cars, now that we have looked at how the perception system helps path planners by generating useful maps and localization. Grid based planners can be explored.

If we Look at the problem from a very simplistic point of view, that is a grid world consisting of free paths and obstacles, the grid-based planners will generate an optimal trajectory based on the positions of the free paths and obstacles. Considering another assumption that the dense maps generated by the perception system can be somehow just boiled down to a grid world, the grid-based planners would perform very well.

Some examples of Grid-based motion planners, are A* (A-star), [Dijkstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#:~:text=Dijkstra's%20algorithm%20(%2F%CB%88da%C9%AA,and%20published%20three%20years%20later.), or variations like D* (Dynamic A*). The idea here is to discretize the map generated by the perception system into a grid for the planner to explore. Although a challenge here is to deal with the problem of ever-increasing dimensionality.

With more obstacles and longer paths, the search space in the grid grows, which can lead to longer planning times. Advanced data structures, heuristics, or parallel processing may be used to deal with this issue. The idea here becomes that the larger grid space is to be accounted for and the planning time is to be kept down while the computation speed is to be kept optimal and therefore everything becomes a trade-off.

Challenges involved with path planning:

There are several challenges with doing path planning. There can be a situation where the map generated by the perception system cannot be converted into a grid based world because of its high complexity. There are techniques and processes that can be followed in such scenarios.

The generated map can be kept in its original state and not discretized. Instead the perception system can identify key features of the environment and these key features can be represented in a structured format that captures their geometry and relationships, allowing the path planner to reason about the environment. This is called the feature based representation of the world.

There also exist graph based planners, such as PRMs (Probabilistic Roadmaps) or RRTs (Rapidly Exploring Random Trees). These create a graph representation of the environment based on the perceived features and connections between them. The planner can then explor this graph to find paths that connect the start and goal locations while avoiding obstacles.

In such scenarios, semantic segmentation can also be employed to identify objects and regions of interest in sensor data (e.g., camera images). Through this a semantically rich representation of the environment is generated inclusing information about road lanes, vehicles, pedestrians, and other objects. The semantic segmentation output can then guide the motion planner by specifying regions that can be avoided or those that can be traversed. Also algorithms like deep Q learning can be employed to learn from such maps.

Probabilistic Roadmaps:

As decribed above there are situations where the generated maps cannot be descritized in grids. In such scenarious algorithms like probabilistic roadmaps are employed.

A probabilistic roadmap generates a networked graph of possible paths that can be traversed in a given map. This networked graph is generated based on the locations of the obstacles in the map.

Probabilistic roadmaps algorithm consists of two phases, the construction phase and the query phase. The first phase consists of the construction of a graph representation of the environment. During this phase a sampling-based path planning approach is used.

This phase begins by the generation of random configurations in the configuration space of the robot (in our case an autonomous vehicle). A configuration space is the set of all configurations in terms of all the positions and the orientations that the robot can take. The randomly generated configurations represent potential positions and orientations for the robot. The sampled configurations are checked for possible collisions with the obstacles and if the configuration is collision-free it is added to the roadmap graph as a node.

The nodes are then connected by adding edges between them. Edges are created based on a proximity criterion; nodes that are close are connected and these connections represent feasible paths between configurations.

This process results in a roadmap graph, within which nodes represent collision-free configurations, and edges represent feasible paths between them. PRMs result in a connectivity structure that covers the configuration space. The second phase that is the query phase can now come into the picture. Whenever a planning request is received (e.g., finding a path from a start configuration to a goal configuration), PRMs use graph search algorithms to find a path within the roadmap.

Dijkstra's algorithm or A* are common search algorithms used with PRMs. The search algorithm can now be employed to explore the roadmap to find a path from start node (initial configuration) to goal node (desired configuration). The discovered path in the graph represents a collision-free trajectory that the robot can follow to move from the start to the goal while avoiding obstacles.

Notice that this algorithm did not carry out the development of any grid maps based on the real world map generated by the perception system of an AV. This technique can be further combined with computer vision to detect and track dynamic objects which can act as obstacles, which can be further added to the generated graph for better planning.

Topics that can be further explored and research papers on path plannin

Citations :

Lutkevich, B. (2023, January 23). What are self-driving cars and how do they work?. Enterprise AI. https://www.techtarget.com/searchenterpriseai/definition/driverless-car
Rana, K., Gupta, G., Vaidya, P., & Khari, M. (2023, April 11). The perception systems used in fully automated vehicles: A comparative analysis — multimedia tools and applications. SpringerLink. https://link.springer.com/article/10.1007/s11042-023-15090-w
Tjokro, M. (2022, May 2). How perception stack works in Autonomous Driving Systems. Medium. https://medium.com/self-driving-cars/a-perception-framework-in-autonomous-driving-systems-3cdc0b59a3e6
Paden, B., Cap, M., Yong, S. Z., Yershov, D., & Frazzoli, E. (2016, April 25). A survey of motion planning and control techniques for self-driving urban vehicles. arXiv.org. https://arxiv.org/abs/1604.07446
Van, N. D., Sualeh, M., Kim, D., & Kim, G.-W. (2020, May 20). A hierarchical control system for autonomous driving towards urban challenges. MDPI. https://www.mdpi.com/2076-3417/10/10/3543
Cloud1900. (2023, September 29). DARPA Grand Challenge. Wikipedia. https://en.wikipedia.org/wiki/DARPA_Grand_Challenge
OAbot. (2023, August 12). Indy Autonomous Challenge. Wikipedia. https://en.wikipedia.org/wiki/Indy_Autonomous_Challenge
Polimove wins the Autonomous Challenge at CES®, making history as the first head-to-head Autonomous Racecar Competition Champion. Indy Autonomous Challenge — Official Website. (n.d.). https://www.indyautonomouschallenge.com/polimove-wins-the-autonomous-challenge-at-ces
Naughton, K. (2019, December 5). Waymo's autonomous taxi service tops 100,000 rides. Bloomberg.com. https://www.bloomberg.com/news/articles/2019-12-05/waymo-s-autonomous-taxi-service-tops-100-000-rides#xj4y7vzkg
Kirkland, G. (2019, December 17). 5 obstacles autonomous cars need to face before they hit the Road. Innovation & Tech Today. https://innotechtoday.com/autonomous-cars/
Dwivedi, P. (2017, August 9). Planning the path for a self-driving car on a highway. Medium. https://towardsdatascience.com/planning-the-path-for-a-self-driving-car-on-a-highway-7134fddd8707
Navigation. Go to Honda. (n.d.). https://usa.honda-ri.com/-/motion-planning-and-interactive-decision-making
6 new perception systems for AI self-driving cars. Nanalyze. (2021, October 12). https://www.nanalyze.com/2019/01/perception-systems-ai-self-driving-cars/
Sensor fusion. Sensor Fusion — an overview | ScienceDirect Topics. (n.d.). https://www.sciencedirect.com/topics/engineering/sensor-fusion
Alkiek, K. (2018, October 2). Traffic light recognition — A visual guide. Medium. https://medium.com/@kenan.r.alkiek/https-medium-com-kenan-r-alkiek-traffic-light-recognition-505d6ab913b1
Image classification and object detection. Ambolt. (2023, June 9). https://ambolt.io/en/image-classification-and-object-detection/
Grel, T. (2021, April 14). Region of interest pooling explained. deepsense.ai. https://deepsense.ai/region-of-interest-pooling-explained/
Sellat, Q., Bisoy, S. K. B. K., & Priyadarshini , R. (2022, January 14). Semantic segmentation for self-driving cars using Deep Learning: A Survey. Cognitive Big Data Intelligence with a Metaheuristic Approach. https://www.sciencedirect.com/science/article/abs/pii/B9780323851176000029
Soetens, P. (2023, March 29). Visual slam and cartographer. Intermodalics. https://www.intermodalics.eu/expertise-1/visual-slam-vslam
Chen, J., Chen, W., Li, J., Wei, X., Tan, W., Shen, Z.-J. M., & Li, H. (2022, December 1). Path planning considering time-varying and uncertain movement speed in multi-robot automatic warehouses: Problem formulation and algorithm. arXiv.org. https://arxiv.org/abs/2212.00594v1
Huang, Y., Li, M., & Zhao, T. (2023, April 19). A multi-robot coverage path planning algorithm based on improved DARP algorithm. arXiv.org. https://arxiv.org/abs/2304.09741
Citation bot. (2023, July 22). Rapidly exploring random tree. Wikipedia. https://en.wikipedia.org/wiki/Rapidly_exploring_random_tree
Hw4 — probabilistic roadmaps. cs548 Robot Motion and Control. (n.d.). https://www.cs.bilkent.edu.tr/~culha/cs548/hw4/

Contents