3D Stereoscopic Video — Fake it or make it?

Iain Anderson

4 months ago

3D Stereoscopic Video — Fake it or make it? 5

While the Apple Vision Pro is likely to have a significant impact in the niche area of stereoscopic video production, it’s likely to remain difficult for a while. If you’re a traditional production or post-production person experienced with regular 2D, what would you do if a client asks you to make something in stereo?

If you’re not excited about 3D stereoscopic video, it’s probably because you haven’t tried an Apple Vision Pro yet. That new dimension really does bring something novel; a sense of presence, of being there, and for some kinds of productions, it’s absolutely worth your time to explore. We’ve had 3D before, but it’s never looked this good. For a long time, the resolution of cameras was far higher than the delivery method, but now, we’re finally at the point where we need cameras to catch up.

Some clients are going to want to push this particular envelope, but if you aren’t sure you can handle the demands of a full 3D shoot, and you can’t bring in someone who can, don’t risk messing it up — shooting 3D is still hard. Even checking you got the shot in the field is hard — you’ll need to monitor with a headset or specialized monitor. And you can’t just bodge two cameras together: getting two cine lenses as close as two human eyes is difficult or impossible without some specialised gear. If the cameras are too far apart, it could enhance the 3D effect, but make the whole scene look small. Here’s a guide if you want to pursue it.

Before we look at how to fake it, let’s look back at how it’s been done.

Stereoscopic shooting over time

There was a push for 3D back in the 1950s, and since the films were shot in black and white, it was relatively easy to use red/blue anaglyph glasses for the 3D effect. Dial M for Murder and It Came From Outer Space are examples of classic 3D movies you can buy today in Apple’s TV app. Of course, without computers, they shot 3D for real.

When 3D movies kicked off again more seriously around the 2009 launch of Avatar (and 3D TVs became a thing for 6 years or so) there were many more films finished for 2D and 3D, some which shot natively in 3D and some with post conversion. Some native 3D productions also explored higher frame rates, including The Hobbit trilogy, Billy Lynn’s Long Half-Time Walk, and Avatar 1 and 2. But not everyone wants to view cinema in higher frame rates. Notably, Avatar: The Way of Water is the only one of those films actually watchable at home with a high frame rate, only in action sequences, and only through Disney+ on Apple Vision Pro. It’s worth a look.

If you’d like to explore the history of 3D film, take a look through this list to see all the 3D options and then, if you’re curious, consult this list to see which ones were shot “for real”.

Faking 3D has gotten a whole lot better

Still, you shouldn’t get too attached to “real” vs “fake”. As you may know already, filmmaking today is not as simple as just pointing a camera and pressing record; hybrid approaches can produce excellent results. Some VFX-heavy movies might be able to deliver excellent 3D while using a single camera on set. Here’s a terrific article to make you an instant expert on the subject.

Gravity is a hybrid; the live action parts were shot in 2D, but as so much of the movie is actually 3D animation, most of the movie was rendered from two separate angles. Many Disney and Marvel films are available in 3D, and a Disney+ subscription is something I’d recommend to most Apple Vision Pro owners. Note that international distribution of 3D films is messy; in the US, Edge of Tomorrow and Ready Player One are in 3D, but here in Australia, neither are. And 3D isn’t always consistent within a film series: Dune is available in 3D, but Dune 2 is frustratingly not — a real shame considering how good the 3D conversion for Dune is.

Some 3D fans have taken matters into their own hands, using conversion tools to create 3D versions of their own content, and even feature films they’ve somehow acquired. The original 2D image becomes one of the two “eyes”, and the other eye is created by analyzing the first image, figuring out where each part of the image should sit in 3D space, and then offsetting each element appropriately to create a 3D image.

These conversion tools used to be rare and limited, creating an effect where flat, depthless 2D characters are positioned in 3D space. I’ve also seen some questionable converted videos on 3D video sharing app SpatialStation. But, at least some of the time, AI can do a good job at both generating an accurate depth map from a 2D image, and filling in the gaps left by characters being moved slightly between the two images.

A real shot, and the depth map created from it by Depthify — it’s a bit fuzzy

On the Apple Vision Pro, the upcoming visionOS 2 includes an automatic 2D-to-3D photo conversion tool, and it’s remarkable how much this fakery can add to the emotional impact of an important photograph. Video is a trickier problem, and output can exhibit some flickering, but since deflickering and temporal smoothing are not new problems, I’d expect this to be solvable.

As an experiment, I shot several comparison shots with different subjects, capturing the same content with a 3D and a 2D camera, then converting the 2D footage to 3D and comparing them in my Apple Vision Pro. The 3D footage is from my iPhone 15 Pro Max (using the built-in Camera app) and the 2D footage is from a Lumix GH6, recording in 6K.

The iPhone’s depth is great, but native Spatial video quality is not really up to professional standards, especially when viewed through a headset. This may change with the iPhone 16 Pro and Pro Max, due in just a few weeks, but we’ll have to wait and see. Today, unsurprisingly, the 6K GH6 footage is more detailed, so is it worth shooting with a nicer camera and adding depth in post?

Software conversion options

Many apps can perform this conversion, so let’s consider a few:

Depthify.ai. Works locally on a Mac (free, slow) or in the cloud (paid). Sadly, this app needs a bit more optimization, at least on Mac. It processed at slower than one frame per second on my M3 Max MacBook Pro. Worse, it wasn’t 100% reliable, and sometimes failed to generate a video at all.
Spatial Media Toolkit, an inexpensive option available for Mac and Apple Vision Pro. You can test the app for free in 1080p, but higher resolutions or durations over 60 seconds require payment (in-app Pro purchase at US$6/month, US$40/year, or US$60/lifetime).
Owl3D, a more comprehensive option for Mac and Windows that can handle regular or 360° videos. There’s a US$10/month plan with support for up to 4K output, but commercial use or even higher resolution costs US$36/month or US$300/year.

Assessing the results

Spatial Media Toolkit is fast, but the artifacts it introduces around the edges of moving objects are pretty distracting — and in a hand-held shot, there are a lot of moving objects. These flaws are not simply because the depth map was imperfect (though it was) but because the newly generated areas aren’t correct. Even if you only plan to shoot on a tripod, you’ll still see odd halos around people’s heads, and the faster objects move, the worse the results will be. Your content will determine how objectionable these artifacts are, but I’m not sure that the output is reliable or tweakable enough for professional use.

Pedestrians walking past at a normal speed were surrounded by warping and distortion in Spatial Media Toolkit, and this is pretty distracting in motion

On the plus side, this app is about as fast as a slow video filter (taking about 3x a clip’s duration to process) and produced 6K Spatial output at a usable data rate (85Mbps from my 200Mbps 6K input). Before you process the whole clip, you’ll see a preview (in parallel, cross-eye or “wiggle” formats on the Mac, or in true 3D on the Apple Vision Pro) and can scrub through the clip, which is useful for a quick assessment. Unfortunately, batch processing isn’t possible, and there’s no file name mapping between input and output, which would makes processing a whole shoot’s worth of clips pretty tedious. If you’re only planning on converting a finished timeline, it’s not really a problem.

Owl3D’s output has fewer artifacts than Spatial Media Toolkit, and it also gives you many more options to tweak, including a variety of depth creation and temporal smoothing options alongside ProRes support. The higher-precision models come at a speed cost, though the results are indeed cleaner, reducing artifacts to a point where a casual viewer may not notice them.

Though 1080p output was pretty speedy, 6K output took much, much longer — an hour to process the same monster 6K 28 second clip. (The developers have let me know that speed is expected to increase a lot with optimisations coming soon.) However, since batch processing is included, running your jobs overnight is an option, and again, if you’re only planning to export a finished timeline, processing speed is not such a big deal.

Batch processing in Owl3D is certainly welcome

Both Spatial Media Toolkit and Owl3D can add depth and immersion to a scene, and while Spatial Media Toolkit has a clear advantage on speed and native Vision Pro support, Owl3D wins on quality. However, neither of them can produce perfect results in a complex scene with fine 3D details, and the artifacts — even with slow processing — can sometimes ruin the illusion.

Are the artifacts too much?

If the contents of your frame are relatively simple and clearly defined, you’ll see fewer flaws, but if you film a garden or park, the depth of each tree or bush is likely to be a softly defined blob; each leaf will not have its own distinct position in space. And foliage or not, parts of some objects will be misinterpreted, because the AI can’t segment every object perfectly. In one clip, I saw a car’s mirror placed far behind the car it was attached to, at the same depth layer as cars across the road.

To be fair, these apps are very much still under development, and I would expect speeds and output quality to increase in the future. We are off to a good start, and there are a few more options available if you have a grunty PC — the dearth of legitimate 3D movies on the Quest has meant that users of that platform have had to be more creative. If you want to live on the bleeding edge, here’s at least one more solution that’s command-line only on Mac, but is potentially faster if you have a decent Nvidia GPU.

What if you don’t want to wait for processing at all? Live, on-demand local 2D-to-3D conversion could be yours (if you have a beefy PC and a Quest) with this solution from Steam. UPDATE: There’s also an Apple Vision Pro app called ScreenLit that offers live (but imperfect) 3D conversion.

This space is still ripe with experimental solutions, and the story is far from over.

Conclusion

Your approach to 3D will depend on if you’re planning on delivering exclusively to 3D platforms, or for both 2D and 3D viewers. Shooting natively in 3D is currently more far difficult, so I can’t blame anyone considering post conversion as an option for hybrid delivery. While processing times can be high, you’ll have to find a balance between quality and speed that you’re happy with. Make sure to review output in a headset before you make that call, though.

Time will tell if any of the 2D conversion tools can deliver reliable, consistent results that are good enough to use on a regular basis. On one hand, the shooting experience is far simpler with conversion, and you’re not going to compromise the film for 2D viewers at all. But you can still experience artifacts, and it’s perhaps a big ask for AI to do a job which many professionals spent a lot of time doing until just recently. As ever, your mileage may vary. But I suspect that for some jobs, shot well, 3D conversion could soon be a viable option that fulfills client expectations with minimal extra effort. Consider it.