(Page 1 of 2 pages for this article  1 2 >)

Wednesday, June 01, 2011

Filed under: CS5EditingGentryMedia Sister SitesMac CoalitionProVideo CoalitionPost ProductionSoftwareVendor ChannelsAdobe

Adobe’s Speech Analysis is still chugging along in Premiere Pro CS5.5

Scott Simmons | 06/01

It’s still far from perfect but as a time-saving tool it saved me quite a few keystrokes

image

When Adobe introduced CS4 they also introduced a new feature that many thought would be the suite’s first real killer feature: Speech Transcription. I think the fantasy might have been this: gone was the day of paying a human to transcribe footage as the machines could finally do it for us. That wasn’t the reality as the results were often a garbled mess. The humans continued to do a better job. Adobe’s transcription accuracy has been improved since version 4 and with CS5.5 it’s actually usable. I used it recently and I’m convinced it saved a lot of typing.

When working on a recent job the producer feel behind a bit on transcribing some of the talking head interviews we were using in the cut. There were four subjects, each interviewed several times in different locations. We only had two full days, with one prep day, to cut this 5 minute piece and he asked me if I could transcribe one of the four subjects. I was expecting to spend the prep day, well … prepping for the edit (organizing, logging, syncing) and still needed to do just that so I thought I’d give Adobe’s transcription another try and see how it could save some time.

Into Premiere Pro CS 5.5

I was cutting this job in Final Cut Pro so my first inclination was to open the audio files (this was a DSLR shoot with double system sound) into Audition, Adobe’s new to Mac audio editing tool, and transcribe them there but Audition doesn’t have transcription capabilities (the old Soundbooth did) so I opened the wav files in Premiere Pro CS5.5.

The actual act of transcribing is quite simple: load a clip into the Premiere Pro Source monitor, select the Metadata tab and near the bottom under the Speech Analysis heading click the Analyze button at the bottom.

image

Speech analysis is performed under the Source clip’s Metadata tab.

That action launches Adobe Media Encoder where the actual file analysis takes place. You’re presented with a quality option before it moves forward:

image

Choose the transcription quality. The higher the quality the longer it takes.

There’s an option to select different languages if you have different languages installed. Those addition languages can be downloaded from Adobe’s website, apparently free of charge. But the most important choice for me was the Quality option. I chose the High (slower) option as my interviews weren’t terribly long (the longest being 1:06) and these transcripts were going back to the producer so I wanted them to be as accurate as possible.

The Analyze Content box is also where you do things like attaching a reference script if you have one (which will go a long way to making it even more accurate) and analyzing for face detection. The Identify Speakers checkbox can be checked if you have different speakers in a single piece of footage but in my case I did not (except for the off camera interviewer but I wasn’t worried about that).

Once done click OK and this launches Adobe Media Player Encoder and begins the transcription process in the background. When it’s complete you have brand new metadata attached to each file. The actual transcription itself is located at the bottom of the metadata tab.

 


The above video is the Speech Analysis in action. Notice how each word is highlighted during playback. You can also mark IN and OUT points and perform edits from that window.

There’s two big questions that’s usually asked about this type of automated transcription: how long does it take and (most importantly) how accurate is it?

The actual analysis time is, I think, quite acceptable. My longest interview clip was 1:06 and using the Best setting it took around 1:09 to analyze the clip. That’s just about realtime. But then I also grabbed another random 14 minute clip and that took just under four minutes so I’m not sure exactly what the ratio of clip length to analyzation speed might be. You’re always going to trade speed for accuracy in situations like this and I’d even be happy to see some type of super accurate option, even if it took twice as long as the best setting.

The accuracy was far from perfect but appeared to be noticeably better than the couple of times I tried to used transcription in CS4. The accuracy depends a lot on the quality of the interview itself. Noisy backgrounds, fast talkers and speakers who don’t enunciate will all reduce accuracy. My interviews contained a producer asking questions off camera and the interviewer audio was, as expected, pretty much useless. The interview subjects were non-professional talent but they were mic-ed well and the accuracy was good enough that I used it as a starting point for my transcriptions.

image

Above is an example of Speech Analysis of a speaker with very good enunciation that hasn’t been corrected.

image

The same text after it has been corrected.

As mentioned above, once the footage has been transcribed it appears as text in the Analysis Text window. One very handy thing that happens as you play the footage in PPro (and click back to the Metadata tab) is each word will be highlighted as it plays along. At any point you can click a word in the transcription, hit play and playback begins from the section that contains that word. When you hit play the Source tab jumps back to the video tab so I found it best to tear off the Metadata tab so I could see the video and the text simultaneously. It’s an incredibly easy way to navigate around an interview and will impress the producer who has never seen it operate. The transcription is now attached to the file as metadata so it will be there whenever you use the clip in PPro from that point forward.

One point that Adobe makes about the accuracy of transcription is that it will be far more accurate if you attach a text-based script file before transcription. This makes sense and if you were working with talent using a teleprompter, a scripted series (write your script in Adobe Story and you’ve got an easy in!) or something similar then I can see where this would be possible. By far more of my jobs that feature talking heads don’t have scripts than those that do so I’m most interested in transcription from a raw file.

Next Up: The two big places where Speech Analysis could use some work.

(Page 1 of 2 pages for this article  1 2 >)

                    Clip to Evernote

 

The Editing of “Courageous” Part One

Steve Hullfish | 10/14

The off-line edit of a RED feature film

image

Last October, I had the rare opportunity to edit a feature film called “Courageous,” which is in theaters now. “Courageous” was the number one new movie the weekend it opened (September…

Check out a Number of Hardware and Software Options from B&H

Jeremiah Karpowicz | 05/16

Everything you need in one place

image

We grabbed Jerry Zorek, Manager of Business Development at B&H, to learn about what B&H was showing off at their studio booth.  He shows us a Resolve system with the…

Final Cut Pro X Multicam Editing webinar now available on-demand

Scott Simmons | 05/15

Plus a little screencast in this blog post on a topic we didn’t get to cover.

image

I had great fun last week presenting the Final Cut Pro X multicam editing webinar…


You must be registered to comment. This is an effort to reduce spam. Please REGISTER HERE.

Have you seen this tool, prEdit:
http://assistedediting.intelligentassistance.com/prEdit/workflow.html
Their web site says, “prEdit also supports transcripts coming from the Adobe speech analysis feature: either using the automatic transcription, with a script fed to Premiere Pro or Soundbooth, or using a transcript through an Adobe Story > Premiere Pro CS5 workflow, that preserves names and punctuation.”

In an email from Philip Hodgetts, president of Intelligent Assistance, he said, “As long as there are timecode stamps in some format or another we can use it. We’ve tested agains all available formats.  Just note that with the ‘standard transcript” we interpolate time values between the time stamps, so there is a little “slip” in the edits.  YOu can correct that in prEdit or trim in FCP after.  The 3playmedia.com transcripts are frame accurate in the edits because they provide time stamps for each word, same as what we get from Adobe.”

So this sounds like there is more capability in the Adobe format than is being exposed in their GUI. Maybe they have an API that would give us frame accurate time codes for every word.

I love the idea of using prEdit to do the first cut of a doc in text form and have the video conformed to match automatically.

I think the next time I tackle a long doc I’m going to spring for prEdit.

Posted by Rob  on  06/02  at  10:54 AM


I don’t know if there’s something more embeded in the Adobe transcription or not. You certainly don’t get any timecode notes when copy/pasting out of Premiere Pro.

Philip does have a lot more knowledge on this due to prEdit (which I have seen and is linked to in the article) so maybe he’ll chime in.

Posted by Scott Simmons  on  06/02  at  11:52 AM


Unfortunately my writing has not been clear enough.  From Adobe we get a time stamp for every word in milliseconds from the head of the file, as an XMP metadata dump from the file itself. That’s if the media file has been processed through the Adobe world. For prEdit 1 - 1.4 that was the only way that prEdit worked.

With prEdit 1.5 we *additionally* support any industry standard transcript format that has timecode stamps per paragraph or so. That can be TXT, DOC or RTF and the transcript file is named to match its media file, and put in the same folder as the media. We parse that file and assign nominal time stamps for each word, but it’s not as accurate as working through Story and PPro. But it requires no additional processing or work.

With prEdit 1.5 we also *additinally* support 3playmedia.com’s JSON format, which also provides us with a time stamp per word. JSON files are - like transcripts - placed in the same location as the media, named to match the media file. These JSON files are as accurate as the Adobe workflow but with the simplicity of the transcript workflow.

There is no way to get timecode stamped transcript files out of PPro or Soundbooth and I’m sorry for the confusion. With our Transcriptize software you can take the Adobe transcription (manual or automatic) and import to FCP’s markers.

Philip

Posted by Philip Hodgetts  on  06/03  at  11:24 AM


It seems obvious, but voice transcription programs will never be made human-proof.

First of all, the same pronunciation can apply to different words, and transcription software can only make a best guess which word is applicable.

Second, until human beings can learn to speak clearly, they’ll never be able to takie full advantage of transcription software.

I work on a TV news program with closed-captioning that uses an automated transcription service. It’s hilarious to listen to the anchors, then see the mangled transcriptions of what they’re saying in the crawl below.

Posted by Jay Congdon  on  06/09  at  04:03 PM


Philip does have a lot more knowledge on this due to prEdit (which I have seen and is linked to in the article) so maybe he’ll chime in.

Posted by jasonwoodler  on  07/04  at  02:46 AM


it’s only a matter of time before voice transcription becomes close to fool proof. It will never be perfect, but the human transcriptions I’ve had done are also not perfect.

The people to watch in this space are Nuance - who have the best publicly available technology, and who Apple are reportedly doing a licensing deal with for (at least) iOS - and Google who’s (no finished) 411 experiment gave them billions of spoken samples for analysis, where the meaning is clear.

3PlayMedia.com also have an interesting approach - combines computer analysis with human correction.

Posted by Philip Hodgetts  on  07/04  at  09:15 AM


Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below:




Final Cut Pro X Multicam Editing webinar now available on-demand
10 Final Cut Pro things FCP editors might be missing in Adobe Premiere Pro CS6
A collection of Avid Media Composer related links for my NAB Migrating to Avid class
An elegant iPhone timecode calculator
Random notes from my first “real world” Adobe Premiere Pro CS6 edit
NAB 2012: SpectraCal
NAB 2012: Apple and Final Cut Pro X
NAB 2012: Lightworks
NAB 2012: Baselight for Avid Media Composer
NAB 2012: Quantel new Pablo and Neo Nano
NAB 2012: Promise Technology’s portable Thunderbolt J4 and J2
NAB 2012: NewBlueFX Titler Pro
NAB 2012: PluralEyes 3.0 from Singular Software
NAB 2012: Technicolor CineLights from the GoPro booth
Autodesk Smoke 2013: it really changed for the better
My top 5 (or so) Adobe Premiere Pro CS6 features
How to preview Avid Media Composer’s MXF files for free without Media Composer
My NAB 2012 Post|Production World classes
Baselight for Final Cut Pro is one of the most powerful legacy FCP grading plugins ever
ARRI’s DNxHD Alexa update, Sorenson Squeeze Pro and OP this, OP that
What’s happening at NAB 2012?
The C300 short Hustle and some before and after images
Tip Tuesday: Disable a clip in the Avid Media Composer timeline
Testing the 7toX Final Cut Pro 7 to Final Cut Pro X conversion
Q and A with Bunim/Murray’s Mark Raudonis about their recent Avid switch
Kicking the tires on the Final Cut Pro X 10.0.3 Multicam update
Update Alert: Final Cut Pro X goes to 10.0.3
Adobe teases Prelude at the San Francisco Supermeet, FCPUG changes its name
Tangent Element panels are now shipping
Avid Media Composer 6 review online







The Editing of “Courageous” Part One

Steve Hullfish | 10/14

The off-line edit of a RED feature film

image

Last October, I had the rare opportunity to edit a feature film called “Courageous,” which is in theaters now. “Courageous” was the number one new movie the weekend it opened (September…

Check out a Number of Hardware and Software Options from B&H

Jeremiah Karpowicz | 05/16

Everything you need in one place

image

We grabbed Jerry Zorek, Manager of Business Development at B&H, to learn about what B&H was showing off at their studio booth.  He shows us a Resolve system with the…

Final Cut Pro X Multicam Editing webinar now available on-demand

Scott Simmons | 05/15

Plus a little screencast in this blog post on a topic we didn’t get to cover.

image

I had great fun last week presenting the Final Cut Pro X multicam editing webinar…

How to get good production dialogue

Matt Jeppsen | 05/15

Use a boom mic and some common sense!

image

Here’s a short and sweet video with a tip on the best way to get good production dialogue audio from your talent. Watch below.

To be considered for listing, contact pr (at) provideocoalition (dot) com


Copyright © 2012, HD Expo, LLC a division of Diversified Business Communications. DBA Createasphere

All rights reserved. HD EXPO, High Def EXPO, Createasphere, E-Tech, Entertainment Technology Exposition, 3D Production Workshop, VariCamp, P2 Camp, ColorCamp 101, and Lighting, Filters & Gels for HD are all trademarks of HD Expo, LLC.

Terms of Use  |  Privacy Policy

Check PageRank