(Page 1 of 1 pages for this article )
Tuesday, August 25, 2009
Native Red Render speeds in new Color 1.5
Mike Curtis | 08/25
Good news=native support & controls, bad news=sloooowww
Red .R3D render speeds in Color - sloooowww
Good news/bad news -
Good news - the new Apple Color version 1.5 (part of the new Final Cut Studo) reads R3D files natively - you can even start a new project and pull in a clip! You have controls for directly accessing the 12 bit RAW Redcode controls in the Primary Color room - this is great! It also does a full resolution, high quality debayer, which is the highest quality way to do this - better than the half res solution that Final Cut Pro utilizes. For the money, this is definitely the easiest, highest quality, most control solution for grading your Red footage. All good news.
Bad news - for now, it is SLOOOOOOOOW - on an maximally pimped system*, it was rendering about 1.43 fps - so about 17 times slower than realtime. And this was only with a simple lift/gamma/gain adjustment, no color adjustments, no secondaries, no filters, no rescale, etc., all of which can make the process take even longer. Ouch! Here’s to hoping that Apple will support the Red Rocket card to accelerate debayering and scaling of Red .R3D files directly within Color in a future version.
Read on for details.
Quality nurd that I am, I like this approach. But with no network render solution offered, only the indies with waaaaaaay more time than money can afford to sit around with their station tied up rendering. For example, based on that clip test I did, rendering the following would take:
30 second commercial (all cuts no dissolves): about 8 1/2 minutes (client survivable - get coffee or talk/distract)
your 10 minute short film: about two hours and 50 minutes - go take a long lunch and/or see a movie
90 minute feature: start it on Friday afternoon, or be ready to take the next day off, because it’ll take about 25 hours to render
Realistically, you render bits of it as you go, so it isn’t likely that you’d really wait and do it all at once, but I’m just sayin’.
* OK, definition of terms
Test system:
8 core Nehalem Mac Pro with the ATI 4870 card and 12GB RAM running OS X 10.5.7, running Color 1.5
Test clip: Red 4K 16:9 Redcode 36 from Build 20, 3 minutes, 14 seconds, 9 frames long, took about 53 minutes to render with Lift/gamma/gain adjustments made, rendering out to ProRes4444 (Apple’s new high quality codec, here used for full range RGB 4:4:4) for a 1920x1080 file, Color set to 12 bit pixel format (as opposed to 8/10/16/32 bit other options - I just left it where I’d last used it)
Also interesting to note that CPU utilization was only about 250% (of a possible 1600%) - so clearly there is room for substantial optimization - doing the dumbest possible, most unlikely it’ll work this way napkin math, that seems to offer a theoretical improvement potential of about sixfold , which would get us to about 9 frames a second rendering (again, probably math doesn’t work this way, I’m just napkin sketching here) which would be, theoretically about 1/3 realtime in that theoretical scenario. Disclaimered enough?
That’s all for today.
Oh, other tidbit I’ll expand on later - preliminary testing on Red Rocket with their latest shipping build 661, looks like ProRes renders about twice as fast as DNxHD, and 720p renders out about twice as fast as 1080p. Hmm.
UPDATE Charles in the comments points out that the “color” portion of color is done on GPU, which doesn’t show up on CPU activity. AFAIK, it works something like this:
-source R3D is read in, debayered at full 4K res on CPU, based on settings in Red tab in Primary In color room
-scaled to 1080p (somewhere in here) on CPU, MAYBE on GPU
-coloring adjustments (like my lift/gamma/gain) done in Color are executed on the GPU - the RGB image and instructions are sent over the bus to the GPU, changes made, and the results sent back across the system bus
-CPU then compresses to codec of choice to disc
(Page 1 of 1 pages for this article )
You must be registered to comment. This is an effort to reduce spam. Please REGISTER HERE.
There has been a lot of chatter on REDUser about transcodes/renders not fully utilizing all 8 cores on the Nehalems - the consensus seems to be that the advantage of multiple cores is in doing more work simultaneously (ie, if the drives can keep up, you could be rendering multiple files simultaneously at the same speed).
Keep in mind Color also relies heavily on the GPU, and not the CPU for its work.
Posted by Charles Angus on 08/25 at 03:37 PM
Charles - thanks for posting and good points! I forgot about the GPU load -but the heavy lifting of debayering is done in CPU at this time AFAIK, so lots of sitting around waiting for that to finish, then the color manipulation is done on GPU and the results shipped back across the bus. If we broke down what is taking the longest per frame, debayering is the biggest chunk of it I’d guess.
OpenCL might offer a path to more functionality in the future.
Here’s a fun concept - what if Final Cut Studio (2009) was an interim release getting READY to do more OpenCL stuff in a future release? We can only hope….
-mike
Posted by Mike Curtis on 08/25 at 03:41 PM
Philip Hodgetts said the FCP team won’t be able to ship a Cocoa version (aka a version that could take advantage of Snow Leopard) of FCP until 2011-ish, so I wouldn’t hold my breath. I believe Color is also not Cocoa.
Posted by .(JavaScript must be enabled to view this email address) on 08/25 at 05:25 PM
Mike, check out atMonitor: http://www.atpurpose.com/atMonitor/
It’ll show the GPU load as well as the CPU load, assuming the GPU makes that info available—the ATI HD 4870, alas, does not, which is rather annoying as all three of our MacPros use those cards.
Posted by Adam Wilt on 08/26 at 12:19 AM
Hi Mike,
Didn’t I read in the RED FCP white paper that Color is best suited for 4K 2:1 and not 16:9? Could be old information.
Posted by Graham Futerfas on 08/26 at 10:33 PM
Mike.
Its not fast by any means but you didn’t mention in the article if you turned off (or had on in the first place) external video. It’s kind of an old trick going back to the Final Touch Days to turn off external video while rendering. This should speed up rendering quite a bit as the GPU is not responsible for also updating the external video image.
Just this morning I rendered a 22min RED project (ingested as RED QuckTimes) 4k 16:9 23.98 to ProRes 4x4 1920x1080 the entire process took 4:56 min (took a look at the render log Render Queue > Render Log). The project had a pretty good mix of Primary’s and secondaries. Which is much faster then you’re test but I did have external video off.
Tonight I’m rendering a true 4k ProRes 4x4 (why the client wants this who knows! Trust me they’re not going to be projecting a 4k anytime soon!) version with no resize and its still going but also appears to be going slightly faster then the pervious render because of no resize
Posted by robbiecarman on 08/27 at 08:36 PM
oh I should also say I have almost the exact same machine but Fibre direct attached storage
Posted by robbiecarman on 08/27 at 08:46 PM
Ok now resize render finished about 10 min ago and curiously it did take longer 5:28 - ummmmm weird
Posted by robbiecarman on 08/27 at 08:48 PM
Huh - this Mac had a Rocket card installed in it, maybe that had an effect? I just tore down that box so can’t retest.
Posted by Mike Curtis on 08/27 at 08:52 PM
Not sure I’m getting the 1600% peak processor potential part. Two processors x four cores = 800% peak doesn’t it?
I’ve heard of going to 11, but your amp goes to 20!
On a quad core system, it took me about 40 hours to render the feature I’m wrapping up right now. Definitely pokey.
I was going to go buy a new ATI4870 until I heard that changing the graphic card changes the grade. Choosing between starting over and just letting it cook for a few days was’t much of a hard choice, but the lack of batch rendering means if you have the movie chopped into reels, you have to be there at the end of each render to start the next reel cooking, something that always seems to happen in the wee hours of the morning for some reason.
Posted by samcrut on 09/22 at 11:01 PM
“Two processors x four cores = 800% peak doesn’t it?”
Each Nehalem CPU core includes “hyperthreading”, which allows it to run two processes at the same ime, subject to certain limitations. The CPU monitoring in MenuMeters or atMonitor shows 16 “virtual cores” and these can indeed run at about 1600% when all are busy. I see this during modo renders.
Posted by Adam Wilt on 09/22 at 11:38 PM
|