The HPA Tech Retreat and the Avatar II Deep Dive

Damian Allen

2 years ago

The HPA Tech Retreat and the Avatar II Deep Dive 1 In Palm Springs every February, a group of Hollywood industry professionals get together without the fanfare and noise of a tradeshow. Many of the industry’s leading luminaries, along with the techs in the trenches, assemble for the Hollywood Professional Association’s Tech Retreat: four days of discussion about where the industry is and where it’s going.

Day 1: TR-X

The retreat began Monday with what’s called TR-X (Tech Retreat Extra). It’s a less formal structure with a lot of panels and general discussion around current trends. What really was interesting here were the themes that kept showing up in all the discussions: the dizzying speed of AI/ML encroachment into creative endeavors, the persistence of WFH post-Covid, cloud everything, the challenge of navigating an overly complex global supply chain, the vocational challenge of keeping job skills in line with the speed of technology advancement and replacement turnover, and the struggle to recruit talent in a streaming market desperate for more content.

The conversation traveled in many directions, but here are a few random pieces that struck me as particularly interesting:

Atul Phadris of Vitrina AI is pursuing the lofty goal of using AI to map the global supply chain of the film and television industries. His company’s goal is to create a system that dynamically updates supply chain availability in real-time. Perhaps an overambitious goal, but he had some interesting facts and stats in presenting the problem he’s trying to address. To his point he quoted a Netflix executive as saying, “A supply chain directory is stale the day after it is launched.” Some interesting takeaways form his talk: the film industry has a $255 billion supply chain; the emergence of globally sourcing companies like Disney, Netflix, and other streamers has produced major destruction of traditional, local supply chain relationships; the result is a supply chain that’s almost impossible to predict, producing chaos in this industry of hard project deadlines. He also emphasized the emergence of localization as a massive new sector in the supply chain. Where localization was previously relegated to basic subtitles and poor overdubs, the advent of AI techniques and the demand for content means a major increase in business for those repurposing content for other language markets.

Paul Debevec discussed the emerging technology. He focused on the fact that AI lacks creative control. The ability to “art direct” the results of AI effectively will be pivotal to its advancement in the film industry. He also addressed the fact that machine learning-trained data scientists are going to be pivotal members of the film industry community going forward. Finally, he pointed out that existing technologies like the Lightstage are not made obsolete by AI; they provide the data needed to train the machine learning models.

Tuesday Supersession: All Things Avatar

Tuesday’s sessions were entirely devoted to the production of Avatar: the Way of Water, with lengthy discussions by key members of the production team, including a rare shop-talk interview with producer Jon Landau. To date the film has generated over $2.2 billion in revenue.

While few films will approach the “full VFX” production of Avatar, so many aspects of the groundbreaking production are already filtering down into standard filmmaking practice. As such, these sessions were an extremely valuable window into the practical workflows involved in bringing the tentpole to screen. Let’s take a walk through some of the salient information shared in these sessions.

First session was a panel interview with key members of the production and postproduction team: Robin Charters, 3D CamSys Engineer, Russell Carpenter, ASC DP, Tashi Trieu, Colorist, and Simon Marsh, Camera Specialist.

Russell Carpenter

The session began with DP Russell Carpenter describing some of the Herculean efforts expended in principle photography. While this seems a little odd in a movie that’s almost entirely CG, it turns out that there was actually a lot involved in creating a camera system and pipeline that could integrate the live action sets and live action actors with the performance capture.

Out of the gate Russell was asked by the moderator to address the issue of the production timeline: pre-production began in 2013 and the film was released December, 2022. He quoted James Cameron as saying something to the effect of, “We don’t know how to make it, but we’ll know how we made it after we made it.” Russell explained that the film was made like a layer cake, with the 2^nd layer being built before the 1^st layer was even done. In fact, the entire movie was essentially in a state of flux throughout production. As a basic timeline, in 2013 the script was being developed simultaneously with the production designers’ work developing the imagery for the scenes and characters. At that point it was decided that the entire arc of the story needed four movies, not two.

A year after concept design was completed, the virtual sets were at a place where Jim Cameron could enter the volume with his virtual camera. This was—in Russell Carpenter’s words—“perfect for Jim as a control freak.” Jim would call attention to specific features of the sets and their positioning, then designers would custom-build the locations based on his feedback.

By 2017 the actors came in for performance capture. Since the camera work occurred separately (thanks to the virtual camera system) the entire focus was on performance. When it came to principle photography, Russell needed to match visuals with existing virtual production footage generated from the performance capture sessions (stitched together from multiple takes), and the virtual camera work created by James Cameron. In some cases a technodolly was used to perfectly line up camera moves with the virtual production. A depth camera was used to composite the live action footage with the previously generated virtual camera work in real-time.

A large part of the challenge in cinematography was the development of a camera system that could handle stereoscopic shooting at high resolution while being light enough to be maneuverable by a single operator. This is where Simon Marsh comes into the picture.

Simon Marsh and the Sony Rialto

Simon is the product manager for the Venice and Rialto Sony camera ecosystem. In October of 2017 Marsh met with the Lightstorm team (James Cameron’s production company). At the meeting Lightstorm stated that they loved the Venice Camera system but needed 48 frames-per-second and wanted to separate the sensor block from the body by at least 2 meters. This was essential to get the light-weight maneuverability that Jim needed in a stereoscopic camera rig.

The 48 frames-per-second was already anticipated and within reach. The idea of separating the sensor block, however, was a whole other feat of engineering. After initial tests they managed to get 18 inches of separation. By NAB in 2018 that had expanded to 9 feet, and by a third meeting with Lightstorm in May of 2018 they had achieved 18 feet of separation on a stereo rig, in part thanks to repeaters along the length of the cable. When you consider just how much raw data needs to be pumped from a camera sensor to its processing hardware, this was no trivial achievement. By September of the same year they had delivered preproduction units.

Russell Carpenter was extremely impressed with the color depth of the camera system. In fact, they were unable to find a field monitor capable of displaying the full dynamic range of the output. What Russell found most impressive was the performance at the 2500 ISO. In fact, he decided to shoot the entire picture at 2500 ISO. This allowed the use of what he called surgical lights more akin to rock and roll stage lighting than conventional feature production lighting. At the high ISO what would have normally required 100 foot-lamberts only required 20. It enabled him to perfectly match the lighting of the live action character Spider with his virtual co-stars, right down to the dappling of tree foliage.

The Spider Challenges

As the only main non-CG character, scenes with the character Spider were the most significant for live action cinematography. Initially he was motion captured with his costars. Two years later he then re-enacted his part to mesh with the final motion performances of the others (those performances being the composite of multiple takes carefully edited together).

What turned out to be a significant issue was the fact that Spider actor Jack Champion had grown around a foot from the time of the initial capture to the time of his live action performance. This obviously caused issues with things like eyeline, but also meant the live action crew had to be extremely accurate when lining their work up with the motion capture content. The speed of Jack’s growth also necessitated that all of film three and some of film four in the Avatar series were captured at the same time.

Another challenge in production was his face mask. In the final production the glass was added digitally; the curvature of the wrap-around design of the mask meant that practical glass or plastic would have captured all the stage lighting, crew and rigging. The result would have been a massive VFX paint job, hence the decision to add the glass and scene reflections in post.

A tank for all water conditions

A massive all-purpose water tank was built at Manhattan Beach to accommodate all the filming done underwater, including significant amounts of underwater motion capture. The heated tank was several stories high, held close to 900,000 gallons of water, and could create both 10 knot currents and 2 meter waves. Large modified earth movers were used to generate the waves, while a powerful pump generated the currents. The current was necessary to create accurate and plausible motion capture as the actors swam against it.

The water had to be prepared each day to ensure that it was clear enough to shoot through. Ultimately the size of the tank meant that it could be adapted to just about any scene.

One interesting note with respect to virtual production techniques: LED panels were used to play realistic fire as reflections for the end battle scene. However, it was impossible to get the LED panels low enough relative to the water surface without creating a dark band on the border of the water. (LED electronics and water don’t tend to work well together for long and can lead to electrocution of crew and cast. Not good for insurance rates…) The solution was to reflect the LED light against mirrors placed into the water. Great reflections, no electricals in the water.

Tashi Trieu

Next up to the mic was colorist Tashi Trieu, talking about the color workflow to and through Blackmagic Design’s DaVinci Resolve. In fact, what seemed to be a common theme throughout the retreat was the heavy use of Blackmagic Design throughout the industry. It truly seems that here in 2023, Blackmagic gear has become an essntial 1^st tier staple in the industry. Both Avatar and Lord of the Rings: The Rings of Power featured Blackmagic Design hardware at an infrastructure level, and software at their color pipeline level.

Takeaways from Tashi’s talk include the attention to detail given to acquisition quality. The effects of different beam splitters and polarizers on the color of the image were evaluated in the stereo rigs to ensure the minimum artifact between left and right eye. (To get two cameras close enough to mimic the inter-pupillary distance of the average human eye requires a beam splitter.) Robin Charters later elucidated that new mirrors were actually developed by and sourced from an aerospace company.

An important concern to James Cameron in the grading process was that the water “held its volume.” A lot of time was spent working on the saturation and increasing gamma so that elements didn’t feel like they were floating in space, but instead were suspending in substantive ocean water.

Throughout the grading process in New Zealand Tashi graded directly on a 38 foot Dolby Vision laser projector in 3D. In fact, 95% of the grading was done in 3D, with perhaps 5% being spent grading the 2D version. Tashi noted that despite anticipating fatigue from the lengthy stereoscopic viewing sessions, he experienced none that he could discern. (The audience only sees the movie in 3D in a single viewing session; Tashi as a colorist was viewing it for several hours each day.) He attributes this largely to Jim Cameron’s belief that when cutting 3D, the subject of interest should almost always be placed at screen plane, so that the viewer’s eyes focus and converge at the same distance. In typical 3D viewing, the human eye is often led to focus in front of or behind the screen, forcing the eyes to converge at a different distance to their focus. This is something that the eye almost never does in natural viewing and is often a source of eye strain.

Tashi also posited that the dynamic range of the Dolby projector may have alleviated eye strain.

As to the shot delivery, there were only 3 shots in the film that weren’t visual effects, and every shot in the movie was a preprocessed, debayered EXR before arriving at the grade.

One interesting side-note: Tashi worked on remastering Avatar 1 at the same time he was grading Avatar 2. He was able (admittedly with a little help from Blackmagic Design) to upgrade the Avatar 1 grading sessions to Resolve 18 from their original version 13 years prior. Thirteen years is an eternity in project file structure, so that in itself is a testament to the Resolve engineers’ efforts toward backwards compatibility.

And a final lesson Tashi learned working with James Cameron: If there’s something that takes 10 seconds, you’d better figure out how to shave 9 seconds off that time before showing it to Jim.

Robin Charters

Robin’s official title on the project was “3D Systems Engineer Architect,” but it’s pretty clear from his talk and the other speakers’ deference toward him that he was involved at a deep level in general pipeline implementation. One of the mandates for this film was to buy everything, rent nothing. And if it couldn’t be bought, build it in-house. This might sound extravagant given the price of high-end cameras, but due to the duration of production, rental fees would have exceeded purchasing significantly over the course of the project.

The team created their own assembly space—the “Coral Circle Skunkworks”—down the road from the Manhattan Beach studio, complete with everything from CNC mills to custom SoC development boards. They designed and built three custom stereoscopic underwater rigs: a “traditional” rig, a 15mm Nikonos wet lens that allowed for a wet mirror, and a nano camera using Sony sensors mounted into a housing called Dream Chip. The design requirements were that the camera rigs had to be handheld and come in under 30 lbs (in addition to being 47.952 fps and capable of a dynamic range that would satisfy the final Dolby Visual master). All up the manufacturing processes took a team of about 8 people 9 months.

One interesting anecdote with respect to attention to detail: Robin and his team tested several monitors before getting them in front of Jim. Of paramount importance was the signal latency of the monitors. Since depth compositing of the virtual elements over the live action cinematography needed to happen in real-time, anything that added latency needed to be avoided. This was so religiously followed that the crew would actually turn monitors upside down to avoid hitting a frame buffer needed to invert the image digitally.

Modified Blackmagic Terranex units were used as the backbone of the signal chain and Robin noted the great collaboration between Sony and Blackmagic, with Sony even providing a complimentary Venice to Blackmagic’s team in Melbourne to help them work with the data pipeline.

In all 17 Venice camera bodies were used, along with two additional sensors, the latter allowing them to repurpose the body without having to remove sensors from the rigs. 46 Terranex format converters made their way into the signal chain. USB depth cameras were used, with the USB3 camera signals delivered over fiber. Ubiquity Unify networking equipment was used for PoE networking.

One unexpected element that became crucial was the quality of the cabling to ensure power and data got where it needed to go. Improving the cable connections turned out to be the best way to improve the cameras, with cables needing to sustain a 600W power draw while reaching a length of 500 feet.

Lastly, Robin’s team had a hard time sourcing 3D preview monitors. The stereoscopic home theatre craze had run its course and it was impossible to find any newly-manufactured stereo monitors. Finally they discovered that a medical division of Sony still made a 3D monitor for surgery that was available for purchase.

Jon Landau

A highlight of the sessions was a video call-in from legendary producer Jon Landau. Jon started by discussing the philosophy behind stereoscopic 3D at 48 frame per second. 3D was chosen due to its sense of immersion. It’s interesting to note that while Jon appreciates VR technology he has no interest in making a film for VR. His argument is that in VR the viewer directs their own attention. In filmmaking, the audience is fed the point of view of the narrative. Hence stereoscopic 3D works for added immersion, without losing the ability to control and direct the audience’ attention.

Why 48 frames per second? About 10 years prior to release they internally did tests at different frame rates. They shot actors at frame rates up to 60 frames per second and determined that for certain shots, 48 frames per second was the ideal frame rate. It wasn’t necessarily the best for all shots, but it was for action sequences. (During the super session, we were actually shown action sequences at both frame rates as a comparison. There was definitely a jarring difference between 3D action sequences at 48 frames per second versus 24).

Jon then outlined 4 key areas where Avatar 2 has pushed filmmaking technology:

Underwater performance capture
Water simulation advancement
Facial capture and performance
Real-time depth compositing system

In particular he feels that the depth compositing has the broadest application for filmmaking in general. While not every film is going to require underwater capture or facial performance capture, the ability to composite CG elements in real-time over live action cinematography is useful for any production that integrates visual effects.

In a post-mortem of Avatar 1, depth compositing and facial capture were two of the technologies that emerged as important to improve upon. Working with Weta, the team developed a radically more sophisticated facial capture system. A deep learning algorithm was trained on a library of custom takes of actors delivering lines and emoting, and was then used to drive not just the skin deformation of the virtual characters, but the underlying muscle movements as well.

Obviously water was a huge component of the second movie, and the production team went to extreme lengths to achieve realism. That began with training the actors and crew in “breath hold” diving. Since not even the crew could be “on air” (the bubbles would have affected the capture), the cast and crew all had to learn to stay down for lengthy periods on breath alone. They all became certified divers and even went to Hawaii on a night dive where they encountered giant manta rays—a very “Pandoran” experience.

Weta worked tirelessly in research and development on water simulation. The trickiest effects were when actors breached the water surface, working out how hair, skin, and clothing should transition from full immersion to above water. For many of the water effects they didn’t “crack the code” on realism until the very end of the film production. It often wasn’t until looking at final renders that the creative team cold identify shots that “didn’t feel right,” then determine what was lacking or not performing correctly.

Landau also discussed editorial a little, but since we have comprehensive coverage of the editorial process in our interview with Jim Cameron, we’ll eschew the details here.

Cloud computing became a major part of production toward the end. As Weta attempted to render final shots, they actually maxed out the New Zealand power grid in the region where the studio was located. As a result they had to turn their attention to the cloud. They ended up using 3.3 billion core hours of rendering using Amazon Web Services.

Finally, Jon drew attention to the production’s efforts toward sustainability: everything from solar panels to craft services. Cast and crew were served vegan meals and provided with permanent water bottles, along with refilling stations. In general the efforts are commendable, but it does beg the question: just how much energy was blown on those 3.3 billion core hours of cloud rendering, not to mention the energy consumed by Weta locally…

Delivering Avatar

The day rounded out with a discussion of what it took to actually deliver Avatar: The Way of Water. It turns out a staggering amount of work. There were 1065 different builds of the movie. Color and contrast were evaluated for different screen brightnesses and aspect ratios, to ensure that every local cinema had an optimal viewing experience. By the time all of the various subtitles were added, there were 6,338 total deliverables. 28 languages supported dubs, and 51 languages supported subtitles. It was a 15 reel movie with a total runtime of 3 hours, 12 minutes, and every permutation and combination of subtitles, dubs, aspect ratio, and output level had to be QC’d.

If that weren’t enough, the creatives in New Zealand manually placed each subtitle in screen X, Y, and Z space for the best possible experience, based on where the screen action was. They even subtly animated the depth of the subtitles to optimize for audience comfort. I left with a new, deep appreciation for titling, dubbing, and distribution supply chain in general.

That’s it for the first two days coverage of the HPA Tech Retreat. Stay tuned for day three…