Creating Stereoscopic Video for the Apple Vision Pro

Iain Anderson

8 months ago

Stereoscopic 3D video has been tried before, and each time, it’s failed to catch on. But this time, now that the Apple Vision Pro has arrived, could, finally, be different. The technology is better than ever, formats and software are maturing, and there’s not just one path forward, but many. Some of these paths even cross over nicely with existing workflows, so you don’t necessarily have to take a huge risk to get started.Right now, distribution is still messy. With no official YouTube or Vimeo app (both are expected soonish) it’s necessary to sideload, with iCloud or AirDrop, or use third-party apps to view existing 3D or immersive content. Workflows are messy too. Future versions of Final Cut Pro and DaVinci Resolve, due later this year, should make creation easier for the casual audience, but you can get started now if you’re curious. Here, we’ll take a look at the new options we have today, and what we should be able to do a few months from now. But first…

Some quick definitions

What does Spatial mean?

Simply, Spatial video is non-immersive 3D stereoscopic video, shot with a regular, non-fisheye lens. The easiest ways to shoot in Spatial are with an iPhone 15 Pro or Pro Max (in 16:9), or with the Apple Vision Pro itself (1:1). If you capture video with one of these devices, these files will usually be presented with a fuzzy border around the edges, and if you move your head from side to side, you’ll see a little more to the edges of the screen. It’s also possible to view a Spatial video in an immersive mode, zooming it up to take more of your view and expanding the fuzzy border out a long way.

Fuzzy edges on Spatial Videos are a good way to hide missing information at the edges of frame

Creating Stereoscopic Video for the Apple Vision Pro 1 — Fuzzy edges on Spatial Videos are a good way to hide missing information at the edges of frame

However, it’s also possible to make 3D videos that technically aren’t Spatial video. For example, if you don’t include all the correct metadata (FOV, baseline, disparity) then your videos won’t be shown with a fuzzy edge, for better or worse. Also, 3D feature films are not designated as Spatial video — simply 3D — and the edges aren’t fuzzy. So Spatial means 3D, less than 180°, and with metadata, usually fuzzy edges. When streaming platforms start officially delivering 3D videos later this year (Vimeo is on board, and YouTube have an app on their roadmap) we’ll see how they choose to present the edges of the frame.

Right. What does Immersive mean?

Immersive means 3D stereoscopic footage shot in 180° format. Enveloping the viewer, it’s a wholly different experience when compared to regular footage, and it requires a different approach in terms of shooting, processing and editing. Apple have produced a number of videos in this style.

Let’s dig into the new Spatial world with a simple option you may not have considered.

IMAX-sized regular 2D video

One of the surprising things about viewing video in the Apple Vision Pro is how much more it feels like a cinema experience than a TV experience. By placing yourself in a virtual theater, or a beach, or mountaintop, you can quite successfully fool your brain into thinking the screen is massive, and it’s not something you can be told, you have to try it to get it. The IMAX app is a great place to start.

Crucially, because there are no edges to the display, you can now make videos in any aspect ratio you wish. If your camera supports open gate — like some Panasonic, Blackmagic and Canon cameras — that means you can capture images a lot taller than normal. If you were to present these images with the same height as a normal 16:9 or 2:1 frame, they’ll be much smaller, but… what if you keep the width the same and expand the image vertically, as happens in an IMAX theater? Suddenly you’re creating a much more immersive experience without needing fisheye lenses or changing the way you make videos.

Using a third-party video player, I can make my 6K 4:3 video too big to properly screenshot — this shot goes way beyond my normal vertical field of view

Presenting videos in such a way is easier said than done, but it’s not an unsurmountable challenge. Square-ish videos can already be recorded by the Apple Vision Pro itself, so playing them back isn’t the problem, but as the Files app sets a height limit for video playback, you’ll need a third-party app to make it huge.

If you’d like to make video like this, the workflow is dead simple. Simply shoot at the highest resolution your camera supports, make a timeline to match, and export at that same resolution, standards be damned. My Panasonic GH6 is very happy shooting 5760x4320px at up to 30fps, and a 7mm rectilinear lens lets you capture a very wide, highly detailed view. Worth exploring, supported by every NLE, and simple.

Creating 3D stereoscopic motion graphics

If you’re handy with After Effects, Motion or Blender, you’ll be able to make synthetic stereoscopic 3D with very little hassle; it’s just a quick jump beyond regular 3D-presented-as-2D in those apps. There are a few possible workflows, including over-under, but we’ll keep it simple by using the Full Side-by-Side format. This should work with any NLE that allows custom resolutions.

In theory, it’s easy — render two cameras from two separate left- and right-eye perspectives, then combine them side-by-side in a double-wide video file. But in practice, as you might have guessed, there are a few hoops to jump through.

How to start? First, create a 3D scene with some simple elements placed in 3D space. Create a camera, making sure it’s a two-node camera if you’re in After Effects, and dolly forward in Z over time. Be sure to keep content at a reasonable distance from the camera to avoid convergence issues that can make people queasy.

If you’re working in Motion, create a camera, call it “Camera L”, duplicate it, then rename the duplicate “Camera R”. Select Camera R, then right-click the Position property and choose Add Parameter Behavior > Link. Set the X offset to 60 in the behavior controls as a quick-and-dirty way to get started. In the real world, your eyes are about 63mm apart, but in pixels… who knows? (After Effects uses 3%, and 60px is close to 1920×0.03, so it’s in the ballpark for experimentation.)
If you’re working in After Effects, there’s more substantial support. Create a two-node camera, then right-click it and choose Camera > Create Stereo 3D Rig. Three new comps will appear, and you’ll see new controls over Camera Separation and Convergence. Nice! Check out this help page for a lot more information on Stereoscopy.

When you render your video, simply render twice, once with each camera, to produce two separate files. To create a full-side-by-side video, simply position the two files next to one another, to the left and right of a double-wide timeline in the NLE of your choice. Here, I’ve created a 7680×2160 timeline in FCP, and it works fine, but any edits must now be made to both eyes at once.

A 3D render from Blender dropped into a Final Cut Pro timeline — 8K is not a challenge any more

If you’re working in Blender, stereoscopy is easy if you’re working with real-world units, but a little harder if you’ve been working at a larger or smaller scale. In the Scene tab, tick “Stereoscopy” and under output, change the Views Format to Stereo 3D and the Stereo Mode to Side-by-Side. Now, your renders will double in width, with left and right beside one another.

Converting old 3D formats to Spatial

No matter the app you used, you should now have a full side-by-side video file, and many playback apps on Apple Vision Pro and Meta Quest can handle these files directly. However, SBS is not a great solution. Adding “_SBS” or “_FSBS” to a filename doesn’t really count as “metadata”, and worse, doubling the amount of video data doubles your file size requirements. Finally, as SBS just shows two images together, you’ll see both on a device that can’t show 3D.

So Apple, in search of a better way to distribute Spatial video, chose to co-opt the little-used-but-existing MV-HEVC standard to store the extra video frame in a 2D-compatible way. In theory, one eye can be described in terms of how it differs from the first eye, and though this may not always occur, the possibility is there. This new standard format does promise to make things easier, but right now, you’ll need some extra tools to convert between the existing SBS formats and Spatial video.

One of Acer’s demo videos being converted to Spatial with SpatialEncoder

There are free and cheap options available, depending on your patience and technical knowledge, but a few options to get you started:

SpatialEncoder is a cheap Mac App Store app that converts SBS to MV-HEVC
Spatial Video Converter is another app from the same developer that converts from MV-HEVC to SBS format
spatial-media-kit-tool is a command line app that converts in both directions for free
Spatial Metadata GUI is a free app based on the command line tool listed above that encodes to MV-HEVC
Compressor, Apple’s pro encoding solution, can read Spatial MV-HEVC videos and export one eye at a time — set up some presets to make life easier

If you’d like to experiment today, these tools will get you started, but we will have more options soon. Apple have presented sample code designed for the upcoming macOS Sequoia to solve this problem, and of course when the functionality is built into FCP and Resolve we’ll have a much easier time.

But… can you just shoot in 3D? Well yes, yes you can.

Real world non-immersive 3D video

If you have an iPhone 15 Pro or Pro Max, you can enable spatial video support in Settings, under Camera, then Formats. In the Camera app, turn the phone sideways, press record, and you’re away. A whole lot of complexity disappears and It Just Works. Now of course, it’s not Log video, and it’s not perfect. The two lenses are quite close together, so the stereo separation isn’t very strong unless you’re quite close to your subject. Also, because the ultrawide 0.5x lens has to be cropped quite strongly to match the standard 1x lens, the maximum Spatial resolution in the Camera app is 1080p.

If you have the latest flagship iPhone, you can shoot in Spatial right now

Note that you can use third-party apps to record 4K Spatial Video, and I’ve used them quite a bit, but there are issues. If you’re not on a tripod, stabilization is necessary to avoid making viewers sick, but that’s not possible in 4K. Worse, the quality of that ultrawide image is quite poor when compared to the main lens, and focus isn’t even guaranteed to match between the two lenses. Metadata isn’t always present either, so third-party footage won’t necessarily be shown with the fuzzy edges viewers expect.

Still, this was a worthwhile experiment to see if 3D added to the experience — and it did give a sense of presence that 2D video lacks. Non-immersive footage is less likely to make viewers feel nauseous than immersive footage, but you do still need to avoid shakes. If you’re going to move, use a gimbal, and lock its direction to keep the shot as consistent as possible. If you can hold a gimbal while riding a bike, you can record smooth, watchable, moving 3D footage.

Hopefully, as September’s new iPhone 16 Pro is rumored to include a 4K ultrawide sensor, 4K Spatial recording will be enabled, with stabilization, in the default camera app, Right now, the 15 Pro makes this possible, but not great. So are there other options?

You could shoot with two independent cameras, but the challenges are non-trivial: frame-level sync, matching position, matching all the settings all the time, matching lenses, and then of course workflow. You can eliminate most of these issues by using a stereo lens on a single camera, but very few such lenses have ever been made. Even if you find one, you’ll halve your sensor resolution, so a camera with an 8K sensor is required to record in 4K stereo.

The Loreo 9005 is an older, consumer-level lens that’s one of the very few options for non-immersive 3D

If you’re looking to experiment with 4K 3D and you don’t want to use a phone, there are some cheaper consumer-level options. You could try the Acer SpatialLabs Eyes Stereo Camera, which has been shown at trade shows but not yet released, or the Kandao QooCam EGO, which has been out for a little while.

The Acer SpatialLabs camera will support stills and video, and is part of a larger 3D product range from Acer

Unfortunately there’s not much else you can step up to without spending several thousand more dollars; Canon have a spatial (non-immersive) RF-S7.8mm lens coming soon for their EOS R7 but it won’t be out until later this year.

This upcoming lens from Canon is specifically designed for non-immersive 3D capture

Why the dearth of options? Because 3D video has been focused mostly on immersive 180° content. Let’s take a wider look.

Immersive 3D video

Using a pair of ultra-wide 180° lenses to record your image is an intriguing halfway house between regular filmmaking and 360° filmmaking. You can hide a crew, but it still requires a different approach to storytelling, which is why most 180° productions, like 360° productions before them, have been documentary-style, and often focused on experiencing an amazing environment. Apple’s own Immersive videos largely take this route, though the Alicia Keys rehearsal session pushed a few more boundaries.

Depending on where you sit in post-production, Immersive may or may not be for you. It’s not very exciting for editors, because you can’t cut nearly as often as you can in a regular production. Directors, producers, set designers, and cinematographers can embrace a challenge, though. Think of it as a theater experience, presenting an environment for the viewer to look around. You can cut now and again, you can tell a story, but take it slow.

Twin 8K sensors should give very good 3D immersive picture quality

Although the Immersive experience is transformatively different when compared to big 2D video, picture quality alone isn’t necessarily better, because you’re spreading your available pixels out over a large area. You’ll need the absolute best sensors and the fattest distribution pipeline to deliver a great image, and pixel peepers will see flaws — at least, until Blackmagic’s URSA Cine Immersive camera comes out this year with a staggering 8K per eye. That option will be priced as a rental for most of us, but you can grab Canon’s 180° RF 5.2mm lens today for US$2000, the slightly narrower 144° RF-S 3.9mm option for $1100, or Kandao’s VR Cam for US$4000 if you want to get started now. Some great content is being made — check out Explore POV’s fantastic virtual travel app full of immersive content — but it’s not easy yet. Workflow complexities means you’ll probably need some extra subscriptions from Canon and/or Mistika too.

Should you dive in?

Conclusion

We’re on the cusp of change, and it’s tricky to recommend spending a lot on tech when the next few months could bring significant changes. It’s also worth remembering that the number of viewers who have Apple Vision Pro is still small, and will remain so for a few years yet. Still, today’s and next month’s iPhones are a great way to get comfortable with 3D non-immersive video, and there are moderately priced 180° options if you want to explore immersive filmmaking more seriously. (For another cheap immersive option, Canon have shown a 360°/180° folding 8K hybrid that will hopefully see a release this year.)

The easiest way to start is to grab your phone and run some filming experiments to see what works. If you want to experiment with editing, convert your videos to SBS, edit in wide timelines, and then convert back to spatial. If you’re into motion graphics, render some extra cameras, combine, and see how they look. Everything you do in spatial can be delivered to 2D platforms too, so you can still deliver to a wider audience. (If you really don’t want to compromise on 2D quality, mount an iPhone on top of your regular camera and dual-record, just to see what works.)

In a couple of months, workflows should be quite a bit smoother, and hopefully with some live previews of 3D footage in the Apple Vision Pro itself. We will have more options for recording, editing and titles, we’ll need fewer workarounds, and the audience for spatial content will have grown just a little more. If you’re a filmmaker with an interest in pushing the boundaries, there hasn’t been a better time to explore — just be ready for more change soon.