Tencent InstantMesh, an AI Model Capable of 3D Rendering Static Images Unveiled

Tencent has released a new artificial intelligence (AI) model, dubbed InstantMesh, that can render 3D objects using a static photo. The new AI model is an upgrade over the company’s older Instant3D framework and now uses a combination of a multiview diffusion model and a sparse-view reconstruction model based on the large reconstruction model (LRM) architecture. Tencent has also made the InstantMesh model open source and has offered a preview app for enthusiasts to test out its capabilities or generate and export 3D renders.

The company published a pre-print version of its research paper on arXiv. Notably, arXiv does not conduct peer reviews, so it is difficult to say whether the model has been assessed. However, the company has already made the AI model available in open source on Hugging Face, so developers can test its efficiency. For enthusiasts, there is an app view available as well where they can add a photo and watch it turn into a 3D render. We, at Gadgets 360, tested out the platform and found that the renders were created in under 10 seconds, as the company claimed. However, the quality of the renders felt quite low quality. An X (formerly known as X) user posted a video of using the AI model, and you can see the results below.

Coming to the technology behind the AI model, the company uses two different architectures — a multiview diffusion model and an LRM architecture. The former helps in processing the image as input and generates different dimensions which are not visible in the image, and the LRM constructs an orbital view object that can be experienced in a 3D environment.

According to Tencent, InstantMesh solves the Janus problem in the world of 3D rendering. The Janus problem is a phenomenon in 3D rendering space where, since the model has to “imagine” different sides of the reference object and create them, it creates multiple canonical views of the object instead of a cohesive 3D object. The company solves the issue by using a novel view generator fine-tuned from Stable Diffusion.

The research paper also shared benchmark scores compared to different existing models, including Stability AI’s Stable Video 3D, which was recently launched. Based on the scores, InstantMesh performed better than SV3D on Google Scanned Objects (GSO) and OmniObject3D (Omni3D) orbit views. SV3D fared better in a couple of parameters in the Omni3D benchmark, which corresponded to the resolution of the output, but Tencent said that it was intentional. “We argue that the perceptual quality is more important than faithfulness, as the “true novel views” should be unknown and have multiple possibilities given a single image as reference,” the company explained.


Affiliate links may be automatically generated – see our ethics statement for details.

Source Link

LEAVE A REPLY

Please enter your comment!
Please enter your name here