3D Perception: a LiDAR-Camera Pipeline — Part 6: Projecting 3D Intelligence into a 2D Map as a…

This article is part of a hands-on series building a 3D Perception Pipeline with ROS 2 Jazzy. In modern robotics, seeing isn't enough — robots need spatial intelligence to locate objects in a 3D world.

The Project:_ We are building a unified system that fuses 3D LiDAR and ZED2 stereo camera data using YOLO for detection and Open3D for point cloud processing. Everything is managed via Pixi, making the environment reproducible on different systems._

The Series:_ These modular lessons collectively form a full end-to-end project. You can follow the step-by-step roadmap or jump into specific sensor logic as needed._

The Roadmap

3D Perception: a LiDAR-Camera Pipeline — Part 1: Setting Up Your Development Environment This article is part of a hands-on series building a 3D Perception Pipeline with ROS 2 Jazzy. In modern robotics…

3D Perception: a LiDAR-Camera Pipeline - Part 2: The ROS 2 Graph & Perception Data Flow 3D Perception: a LiDAR-Camera Pipeline - Part 2: The ROS 2 Graph & Perception Data Flow This article is part of a…

3D Perception: a LiDAR-Camera Pipeline - Part 3: 3D Point Cloud Processing & Clustering This article is part of a hands-on series building a 3D Perception Pipeline with ROS 2 Jazzy. In modern robotics…

3D Perception: a LiDAR-Camera Pipeline - Part 4: 2D Camera Detections with YOLO This article is part of a hands-on series building a 3D Perception Pipeline with ROS 2 Jazzy. In modern robotics…

3D Perception: a LiDAR-Camera Pipeline - Part 5: Merging 2D Intelligence and 3D Depth with… 3D Perception: a LiDAR-Camera Pipeline - Part 5: Merging 2D Intelligence and 3D Depth with Frustum Fusion This article…

In Part 5, we successfully fused LiDAR depth with YOLO semantic labels. However, visualizing those results in a 3D point cloud can be overwhelming for both humans and downstream robotic systems. In this lesson, we build a node that "flattens" the 3D world into a top-down grid. This allows us to overlay our AI-detected "Smart Clusters" onto a clean, geometric map where distances and object sizes are immediately intuitive. This BEV representation is a standard view in autonomous driving stacks, used for path planning and situational awareness.

Learning Objectives

By the end of this final lesson, you will be able to:

Define BEV Geometry: Configure world-to-pixel parameters, including range and resolution (e.g., 5cm per pixel).
Master Coordinate Mapping: Implement the math required to translate LiDAR (x, y) coordinates into OpenCV (row, col) image indices.
Render Semantic Overlays: Use OpenCV drawing functions to project 3D bounding boxes and AI class labels onto the top-down map.
Implement High-Efficiency Pixel Coloring: Use NumPy advanced indexing to render thousands of LiDAR points onto an image canvas with minimal CPU overhead.

We will now look at the code section by section.

Part 1: Imports and Dependencies

We need tools for image processing and ROS 2 communication.

import rclpy
from rclpy.node import Node
from sensor_msgs_py import point_cloud2
from sensor_msgs.msg import PointCloud2, Image
from vision_msgs.msg import Detection3DArray
from cv_bridge import CvBridge
import numpy as np
import cv2

import rclpy
from rclpy.node import Node
from sensor_msgs_py import point_cloud2
from sensor_msgs.msg import PointCloud2, Image
from vision_msgs.msg import Detection3DArray
from cv_bridge import CvBridge
import numpy as np
import cv2

CvBridge: This is the critical link that converts ROS Image messages into OpenCV-compatible arrays.
cv2: The industry-standard library for drawing shapes and text on images.
sensor_msgs_py: Contains the helper functions to parse raw binary LiDAR data.

Part 2: Initialization and BEV Geometry

The BEV is essentially a grid. We must define how many meters each pixel represents.

class BEVVisualizerNode(Node):
    def __init__(self):
        super().__init__('bev_visualizer_node')

        # BEV Parameters
        self.declare_parameter('range_x', 10.0) # +/- 10m
        self.declare_parameter('range_y', 10.0) # +/- 10m
        self.declare_parameter('resolution', 0.05) # 5cm/pixel

        self.range_x = self.get_parameter('range_x').value
        self.range_y = self.get_parameter('range_y').value
        self.resolution = self.get_parameter('resolution').value

        self.width = int(2 * self.range_y / self.resolution)
        self.height = int(2 * self.range_x / self.resolution)

class BEVVisualizerNode(Node):
    def __init__(self):
        super().__init__('bev_visualizer_node')

        # BEV Parameters
        self.declare_parameter('range_x', 10.0) # +/- 10m
        self.declare_parameter('range_y', 10.0) # +/- 10m
        self.declare_parameter('resolution', 0.05) # 5cm/pixel

        self.range_x = self.get_parameter('range_x').value
        self.range_y = self.get_parameter('range_y').value
        self.resolution = self.get_parameter('resolution').value

        self.width = int(2 * self.range_y / self.resolution)
        self.height = int(2 * self.range_x / self.resolution)

Resolution: At 0.05, every pixel represents a 5x5 cm square in the real world.
Width/Height: We calculate the image size based on the range and resolution. A ±10 m range at 5 cm resolution creates a 400x400 pixel image.

We define these parameters to create a smaller image that covers a limited area, primarily to ensure real-time computational efficiency and visual clarity. Processing a massive 150m range at high resolution would require enormous memory buffers and slow down the LiDAR callback, causing significant latency in the perception pipeline. By restricting the view to a ±10 m patch at a 5 cm resolution, we focus the robot's "attention" on the most critical immediate obstacles while maintaining a high enough pixel density to recognize distinct object shapes without lagging the ROS 2 executor.

Part 3: Subscribers and Synchronization

We listen to the raw LiDAR for the background and the fused detections for the labels.

        self.bridge = CvBridge()

        # Subscriptions
        self.lidar_sub = self.create_subscription(PointCloud2, '/rslidar_points', self.lidar_callback, 10)
        self.fused_sub = self.create_subscription(Detection3DArray, '/detection/fused_3d', self.fused_callback, 10)

        self.latest_fused = None

        # Publisher
        self.bev_pub = self.create_publisher(Image, '/visualization/bev_image', 10)

        self.get_logger().info(f'BEV Visualizer Node started. Image size: {self.width}x{self.height}')

    def fused_callback(self, msg):
        self.latest_fused = msg

        self.bridge = CvBridge()

        # Subscriptions
        self.lidar_sub = self.create_subscription(PointCloud2, '/rslidar_points', self.lidar_callback, 10)
        self.fused_sub = self.create_subscription(Detection3DArray, '/detection/fused_3d', self.fused_callback, 10)

        self.latest_fused = None

        # Publisher
        self.bev_pub = self.create_publisher(Image, '/visualization/bev_image', 10)

        self.get_logger().info(f'BEV Visualizer Node started. Image size: {self.width}x{self.height}')

    def fused_callback(self, msg):
        self.latest_fused = msg

Subscription: /rslidar_points (PointCloud2) – Receives the raw 3D data used to draw the grey "map" background.
Subscription: /detection/fused_3d (Detection3DArray) – Receives the "intelligent" clusters containing both 3D positions and AI-generated labels from the fusion node.
Publisher: /visualization/bev_image (Image) – Outputs the final image, ready to be viewed in RViz or a web dashboard.

Unlike the fusion node, we don't strictly synchronize these. We store the latest_fused detections and draw them whenever a new LiDAR frame arrives.

Part 4: Mapping LiDAR to Pixels

The math here translates real-world coordinates into image indices.

    def lidar_callback(self, msg):
        # 1. Convert PointCloud2 to numpy
        points = self.pointcloud2_to_numpy(msg)
        if points.size == 0:
            return

        # 2. Filter points by range
        mask = (points[:, 0] > -self.range_x) & (points[:, 0] < self.range_x) & \
               (points[:, 1] > -self.range_y) & (points[:, 1] < self.range_y)
        points = points[mask]

        # 3. Project to pixels
        # OpenCV image coordinates: (row=y_pixel, col=x_pixel)
        # In BEV: x_lidar maps to row, y_lidar maps to col
        # Flip x to align with front-up
        x_pix = ((self.range_y - points[:, 1]) / self.resolution).astype(np.int32)
        y_pix = ((self.range_x - points[:, 0]) / self.resolution).astype(np.int32)

    def lidar_callback(self, msg):
        # 1. Convert PointCloud2 to numpy
        points = self.pointcloud2_to_numpy(msg)
        if points.size == 0:
            return

        # 2. Filter points by range
        mask = (points[:, 0] > -self.range_x) & (points[:, 0] < self.range_x) & \
               (points[:, 1] > -self.range_y) & (points[:, 1] < self.range_y)
        points = points[mask]

        # 3. Project to pixels
        # OpenCV image coordinates: (row=y_pixel, col=x_pixel)
        # In BEV: x_lidar maps to row, y_lidar maps to col
        # Flip x to align with front-up
        x_pix = ((self.range_y - points[:, 1]) / self.resolution).astype(np.int32)
        y_pix = ((self.range_x - points[:, 0]) / self.resolution).astype(np.int32)

After converting our point cloud to a Numpy array, we crop it to discard any data that falls outside our defined Bird's Eye View area. Using NumPy's boolean indexing, we create a "mask" that checks every point against our boundary parameters:

X-Axis Filtering: It keeps only points where the forward/backward distance is within ±range_x (e.g., within 10 meters of the robot).
Y-Axis Filtering: It keeps only points where the left/right distance is within ±range_y.

To transform these filtered 3D LiDAR points into a 2D image coordinate, we apply a simple Linear Orthographic Projection. Unlike the Pinhole Perspective used in Lesson 5, this top-down mapping ignores the Z (height) value for position and focuses purely on the XY ground plane.

To convert a point (x, y) in meters to a pixel coordinate (u, v). This requires two steps: Translation (shifting the origin to the center of the image) and Scaling (converting meters to pixels using our resolution).

The formulas implemented in the code are:

The Origin Shift: In LiDAR, (0,0,0) is the sensor center. In OpenCV images, (0,0) is the top-left corner. Subtracting the LiDAR coordinates from the maximum range (range_y — y) effectively shifts our "camera" so that the robot sits at the center of the frame.
Orientation: In the ROS 2 X-forward convention, positive X points ahead of the robot. By mapping X_lidar to the v (vertical/row) axis of the image and Y_lidar to the u (horizontal/column) axis, we ensure the BEV image appears with the correct orientation.

Part 5: Drawing the BEV Map

We create a blank canvas and "drop" the points onto it.

        # 4. Draw BEV
        # Create black image
        bev_img = np.zeros((self.height, self.width, 3), dtype=np.uint8)

        # Draw LiDAR points as grey pixels
        valid_mask = (x_pix >= 0) & (x_pix < self.width) & (y_pix >= 0) & (y_pix < self.height)
        bev_img[y_pix[valid_mask], x_pix[valid_mask]] = [200, 200, 200]

        # 4. Draw BEV
        # Create black image
        bev_img = np.zeros((self.height, self.width, 3), dtype=np.uint8)

        # Draw LiDAR points as grey pixels
        valid_mask = (x_pix >= 0) & (x_pix < self.width) & (y_pix >= 0) & (y_pix < self.height)
        bev_img[y_pix[valid_mask], x_pix[valid_mask]] = [200, 200, 200]

This uses NumPy's advanced indexing to color thousands of pixels simultaneously, making the visualization very lightweight.

Part 6: Drawing Semantic Detections

Finally, we overlay the 3D boxes from our Fusion node and publish the BEV image.

        # 5. Draw Fused Detections
        if self.latest_fused:
            for det in self.latest_fused.detections:
                cx = det.bbox.center.position.x
                cy = det.bbox.center.position.y
                sx = det.bbox.size.x
                sy = det.bbox.size.y

                # Convert center to pixels
                px = int((self.range_y - cy) / self.resolution)
                py = int((self.range_x - cx) / self.resolution)

                # Convert size to pixels
                pw = int(sy / self.resolution)
                ph = int(sx / self.resolution)

                # Draw rectangle
                cv2.rectangle(bev_img, (px - pw//2, py - ph//2), (px + pw//2, py + ph//2), (0, 0, 255), 2)

                # Draw label
                if det.results:
                    cls_id = det.results[1].hypothesis.class_id
                    cv2.putText(bev_img, f"ID:{cls_id}", (px - pw//2, py - ph//2 - 5),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 255, 0), 1)

        # 6. Publish BEV Image
        bev_msg = self.bridge.cv2_to_imgmsg(bev_img, encoding='bgr8')
        bev_msg.header = msg.header
        self.bev_pub.publish(bev_msg)

        # 5. Draw Fused Detections
        if self.latest_fused:
            for det in self.latest_fused.detections:
                cx = det.bbox.center.position.x
                cy = det.bbox.center.position.y
                sx = det.bbox.size.x
                sy = det.bbox.size.y

                # Convert center to pixels
                px = int((self.range_y - cy) / self.resolution)
                py = int((self.range_x - cx) / self.resolution)

                # Convert size to pixels
                pw = int(sy / self.resolution)
                ph = int(sx / self.resolution)

                # Draw rectangle
                cv2.rectangle(bev_img, (px - pw//2, py - ph//2), (px + pw//2, py + ph//2), (0, 0, 255), 2)

                # Draw label
                if det.results:
                    cls_id = det.results[1].hypothesis.class_id
                    cv2.putText(bev_img, f"ID:{cls_id}", (px - pw//2, py - ph//2 - 5),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 255, 0), 1)

        # 6. Publish BEV Image
        bev_msg = self.bridge.cv2_to_imgmsg(bev_img, encoding='bgr8')
        bev_msg.header = msg.header
        self.bev_pub.publish(bev_msg)

Once the background "map" of LiDAR points is rendered, we overlay the high-level intelligence gathered from the Fusion node. This step transforms a simple occupancy grid into a semantic map by drawing bounding boxes that represent the physical footprint of detected objects, such as cars or pedestrians.

The process requires a two-stage conversion: Positioning and Scaling.

Positioning: We take the 3D centroid (x, y) of the fused detection and map it to a single pixel (px, py) using the same orthographic projection formulas used for the raw points.
Scaling: We take the physical dimensions of the bounding box — the width (sx) and length (sy) in meters — and divide them by our resolution (e.g., 0.05m/pixel) to determine the rectangle's size in pixels.

Finally, we use OpenCV's cv2.rectangle and cv2.putText functions to stamp the visual identity onto the image. This merged data is what allows autonomous systems to predict behavior—knowing that a "Car" cluster is likely to move forward while a "Traffic Sign" cluster will remain static.

Part 7: The Helpter Function

Next, we have a very important helper function.

    def pointcloud2_to_numpy(self, msg):
        # 1. Create a generator for the points
        gen = point_cloud2.read_points(msg, field_names=("x", "y", "z"), skip_nans=True)

        # 2. Define the data type for the generator (3 floats for x, y, z)
        # We use a structured dtype to match the generator's output
        dtype = [('x', np.float32), ('y', np.float32), ('z', np.float32)]

        # 3. Use fromiter to pull data directly into NumPy without creating a Python list
        structured_array = np.fromiter(gen, dtype=dtype)

        # 4. Convert the structured array to a standard Nx3 float32 array
        # .view transforms the memory layout without copying the data
        if structured_array.size > 0:
            points = structured_array.view(np.float32).reshape(-1, 3)

            # 5. Final safety check for invalid numbers
            points = points[np.isfinite(points).all(axis=1)]
            return points

        return np.zeros((0, 3), dtype=np.float32)

    def pointcloud2_to_numpy(self, msg):
        # 1. Create a generator for the points
        gen = point_cloud2.read_points(msg, field_names=("x", "y", "z"), skip_nans=True)

        # 2. Define the data type for the generator (3 floats for x, y, z)
        # We use a structured dtype to match the generator's output
        dtype = [('x', np.float32), ('y', np.float32), ('z', np.float32)]

        # 3. Use fromiter to pull data directly into NumPy without creating a Python list
        structured_array = np.fromiter(gen, dtype=dtype)

        # 4. Convert the structured array to a standard Nx3 float32 array
        # .view transforms the memory layout without copying the data
        if structured_array.size > 0:
            points = structured_array.view(np.float32).reshape(-1, 3)

            # 5. Final safety check for invalid numbers
            points = points[np.isfinite(points).all(axis=1)]
            return points

        return np.zeros((0, 3), dtype=np.float32)

This high-performance translator streams raw binary LiDAR data into a structured NumPy array while stripping out invalid coordinates. It ensures the data is in a clean N x 3 format ready for real-time geometric processing.

Part 8: The Entry Point (main)

Finally, the standard "boilerplate" that starts the node.

def main(args=None):
    rclpy.init(args=args)
    node = BEVVisualizerNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

def main(args=None):
    rclpy.init(args=args)
    node = BEVVisualizerNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

rclpy.spin(node) is a blocking call that keeps the script alive. It waits for new LiDAR data to arrive on the subscription and immediately triggers the listener_callback to start the processing cycle again.

Over the past five articles, we have constructed a complete, modular 3D perception pipeline from the ground up. We began by processing raw LiDAR point clouds, using DBSCAN clustering to group thousands of individual points into distinct physical objects. We then integrated a YOLO-based 2D vision stack to provide semantic intelligence, followed by the development of a Frustum Fusion node that mathematically bridged the gap between these two sensors. By projecting 3D clusters onto the 2D image plane, we successfully "handshaked" spatial depth with AI-driven classification, creating "Smart Clusters" that know both where they are and what they are.

This final step of Bird's Eye View (BEV) creation is the critical transition from raw perception to actionable intelligence. By flattening the environment into a top-down geometric map, we provide a unified representation that is essential for downstream autonomy tasks like path planning and obstacle avoidance. In a BEV space, a planner can easily calculate safe trajectories, determine time-to-collision, and manage complex maneuvers because the scale and distance of every "Smart Cluster" are preserved in a clean, grid-based format. You have moved from handling raw bytes to generating a high-level environmental map — the core requirement for any self-driving system.

I hope you found this article informative. If you have any feedback or technical questions, please feel free to leave a comment below.

🚀 Robotics Mentorship: If you are navigating the complexities of modern robotics or need a structured roadmap to master new frameworks and algorithms, I provide 1-on-1 personalized guidance.

💼 Project & Technical Consultation: Building a prototype or an industrial robotics stack? I can help you solve specific edge cases, optimize your system architecture, or review your codebase to ensure robust performance.

Connect with me: Find me on LinkedIn or X. Let's work together: https://s1s2.ai/academy/

Contents