Researchers pave the road to true 3D AI

Scientific Paper: CSIOR: Circle-Surface Intersection Ordered Resampling

Authors: Claudio Tortoricia, Mohamed Kamel Riahib, Stefano Berrettic, Naoufel Werghib

Researchers at Technology Innovation Institute (TII), a leading global research center in the United Arab Emirates have developed a new algorithm for representing 3D textures that improves filtering, analytics, and AI. This will lead to AI applications that are faster and more efficient.

“We aim to create the building blocks that would kick off an explosion of neural networks for building 3D AI algorithms,” said TII senior researcher Claudio Tortorici, who led this research.

Early implementations of the new algorithm helped extract faces from 3D scenes, generate 2D maps representing 3D data, and combined distinct kinds of 3D information into 2D pictures. Tortorici believes that future researchers will allow artificial intelligence to process 3D data natively. He observed that most computer vision conferences include the term “signal processing” in their title, which refers to one-dimensional data. However, the images comprise 2D data and we live in a 3D world.

Interest in convolutional neural networks for processing image data has exploded in the last year, but these are not practical for 3D data. Tortorici said: “These kinds of algorithms have not traditionally worked well on 3D data.”

Although companies process 3D imagery today using AI algorithms, this often involves computational gymnastics to make things work. This new preprocessing technique, called CSIOR, could open all kinds of opportunities.

The limits of LiDAR

The mainstream media has often debated why Tesla has shunned the use of LiDAR in training its AI. But as it turns out, processing several 2D images can be more efficient in some ways than trying to craft better algorithms for 3D processing. The new CSIOR algorithm could shift the balance of efficiency and performance towards writing better algorithms for natively processing LiDAR data.

Tortorici said: “I believe we should move to 3D representations. It is not something we can do nowadays with self-driving vehicles because we don’t have algorithms that are mature enough.”

3D capturing techniques like LiDAR capture deep information about a scene as a point cloud that indicates the relative depth of points. But point clouds confound efforts to identify, for instance, a dog as a separate object from the owner walking it.

So, researchers developed mesh manifolds to transform this raw data into connected meshes, making it easier to see how the points connect on the surface of objects. However, existing mesh manifolds struggle with challenges around consistency or accuracy. As a result, AI developers often convert a representation of the world into a 2D picture, which loses some of the information in the raw data.

Researchers have developed some algorithms to directly process 3D data, such as PointNet, which is the 3D equivalent of the AlexNet algorithms that galvanised modern deep learning research. However, PointNet has significantly worse performance than existing 2D algorithms.

Cropping faces

The traditional process of copying a face out of a 3D image involves two separate steps. First, the algorithm crops the area with the face from a projection of the scene, and then a second algorithm resamples the curves in the image. CSIOR allows a face cropping algorithm to capture both the image and its shape in a single step. It takes advantage of the way the algorithm processes the scene as an expanding circle. This could improve algorithms for face recognition that consider the depth of facial features to improve accuracy.

Example of regularisation and extraction of a human face from 3D raw data

Extracting grids from meshes

The team also created an application to extract polar and image-like grids directly from the mesh structure. They found that they could also improve the development of 3D neural network algorithms that worked on the 2D image grids. Tortorici said this is an intermediate step, and they would prefer to work directly on 3D meshes. However, this step also helps demonstrate one incremental approach to a smoother overall process.

Process of generating a regular 2D grid over a 3D mesh manifold in (a) and (b). In (c) the deployment of such grids on 3D surfaces.

Representing new kinds of 3D features in 2D

Another proof of concept explored different approaches to transforming and representing 3D features onto a 2D grid. Rather than just describing the depth, these images could represent features such as maximum curvature, changes in depth, and mean curvature. A series of these kinds of images could be combined to improve 3D processing algorithms.

Caption: This is an example of a series of images generated by transforming raw 3D data into 2D overlays that indicate underlying properties of the object, such as maximum curvature, local depth, and mean curvature. This can pass the 3D information into AI algorithms for more sophisticated 3D algorithms. — This is an example of a series of images generated by transforming raw 3D data into 2D overlays that indicate underlying properties of the object, such as maximum curvature, local depth, and mean curvature. This can pass the 3D information into AI algorithms for more sophisticated 3D algorithms.

The road to true 3D AI

One limitation is that CSIOR currently only works on “open meshes” of 3D shapes. This is fine for representing a scene from a single point of view when you cannot see the back. In this case, the part you cannot see is the “open” part. However, the technique fails to work for “closed meshes,” such as capturing the entire surface of an apple as viewed from all sides.

CSIOR shows immense potential. These early projects demonstrate several intermediate approaches for processing 3D data using existing 2D AI algorithms. Down the road, Tortorici expects this could inspire more efficient algorithms for processing 3D data. “We want to create neural networks that work directly on 3D data,” he said.