Apr 24, 2019

Independent researchers have already proven Musk to be 100% right about LIDAR.

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving https://arxiv.org/abs/1812.07179

3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that data representation (rather than its quality) accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance --- raising the detection accuracy of objects within 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo image based approaches.

Karpathy was also pointing to:

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras https://arxiv.org/abs/1904.04998

We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal. Similarly to prior work, our method learns by applying differentiable warping to frames and comparing the result to adjacent ones, but it provides several improvements: We address occlusions geometrically and differentiably, directly using the depth maps as predicted during training. We introduce randomized layer normalization, a novel powerful regularizer, and we account for object motion relative to the scene. To the best of our knowledge, our work is the first to learn the camera intrinsic parameters, including lens distortion, from video in an unsupervised manner, thereby allowing us to extract accurate depth and motion from arbitrary videos of unknown origin at scale. We evaluate our results on the Cityscapes, KITTI and EuRoC datasets, establishing new state of the art on depth prediction and odometry, and demonstrate qualitatively that depth prediction can be learned from a collection of YouTube videos.

Tesla has also advantage of having radars and is able to do supervised learning of depth estimates for moving objects from video.

I was also skeptical of Tesla self-driving capability, because they had some stupid ideas, like ignoring radar data based on GPS tagging from fleet behavior. But lack of LIDAR will not be the issue. Not even close.

It seems like they are focusing on building proper pipeline for training neural networks. The question is whether neural networks as a technology can handle self driving. Reasoning based on "human brain can do it, so artificial neural networks can do it" is wrong. Natural neural networks and artificial neural networks, besides name share only very, very rough low level conceptual ideas. Moreover, our ANN architectures are probably missing most of what brains do on high level. So, I think this is still an open question - can ANN do it? If not, then nobody will have full self driving capability widely deployed any time soon. Because, even tough neural networks are not perfect, everything else is super brittle in comparison.

LIDAR and hires maps are technologies that give working short-term solution, are sort of local minimum. LIDAR is super expensive and already proven to be not necessary. While, hires maps are super brittle and too capital intensive to be widely deployed.

But if neural networks are sufficient Tesla will leave everyone in dust. There will be literally no competition. They do not need to learn electric cars manufacturing in order to deploy their technology widely, they are doing that already. They will not need to backtrack on LIDAR and hires maps solutions. Also, their decision to deploy self driving hardware to every car means that they have access to stupidly big amount of real world data from all varieties of environments all around the world.

BTW - Extrapolating based what Tesla was doing before Karpathy joined is probably misguided. I was afraid that he will get lost in a big corporation, but he seems to be doing great job there. In my opinion technology that build with his supervision will be significantly better than what Tesla was doing before. But Karpathy is not a magician.

So, Tesla has the biggest potential and the big question is: Are neural networks sufficient for self driving?