DETR3D: 3D Object Detection From Multi-view Images Via 3D-to-2D Queries

3D object detection from visual information is a difficult task for low-cost autonomous driving systems.

A prototype of an unmanned car. Precision of object detection is crucially important for such type of transport. Image credit: BP63Vincent via Wikimedia, CC-BY-SA-3.0

Most current methods build the detection pipelines purely from 2D computations without considering the 3D scene structure or sensor configuration. A recent paper proposes a smoother transition between 2D observations and 3D predictions.

Proposed solution for object detection in 3D space

The researchers propose to link them via geometric back-projection to retrieve image features as needed. Information from all the camera views is fused in each layer of computation and projected onto all available frames.

The suggested architecture does not perform point cloud reconstruction or explicit depth prediction from images, making it robust to errors in the depth estimation. In the camera overlap regions, the proposed method outperforms others by a substantial margin.

We introduce a framework for multi-camera 3D object detection. In contrast to existing works, which estimate 3D bounding boxes directly from monocular images or use depth prediction networks to generate input for 3D object detection from 2D information, our method manipulates predictions directly in 3D space.

Our architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images using camera transformation matrices. Finally, our model makes a bounding box prediction per object query, using a set-to-set loss to measure the discrepancy between the ground-truth and the prediction.

This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model. Moreover, our method does not require post-processing such as non-maximum suppression, dramatically improving inference speed. We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.

Research paper: Wang, Y., Guizilini, V., Zhang, T., Wang, Y., Zhao, H., and Solomon, J., “DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries”, 2021. Link: https://arxiv.org/abs/2110.06922

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

Related Posts

Alphabet set to surge past $2tn valuation as search giant announces first dividend

Our Housing System Is Broken and the Poorest Australians Are Being Hardest Hit

New approach could make reusing captured carbon far cheaper, less energy-intensive

Product And Service Training For Sales Professionals

Hyundai Motor Group U.S. EV Sales Outpaced Ford And GM In Q1 2024

New insights lead to better next-gen solar cells

He Fought Grown Men for Money as a Tween. As an Adult, He Became a Beast

Airbus lifts A350 rate, highlighting edge over Boeing

Female bodybuilder found bound, gagged and stabbed to death at work after haunting post about ‘toxic relationship’

From sedans to SUVs, there’s a reasonably priced electric vehicle for everyone

Tomb Raider: Definitive Edition Finally Arrives on PC After a Decade, but Only on Microsoft Store

PopularStories

Airbus lifts A350 rate, highlighting edge over Boeing

Female bodybuilder found bound, gagged and stabbed to death at work after haunting post about ‘toxic relationship’

From sedans to SUVs, there’s a reasonably priced electric vehicle for everyone

Tomb Raider: Definitive Edition Finally Arrives on PC After a Decade, but Only on Microsoft Store

About Us

Pages