Have you seen this tool, prEdit:
http://assistedediting.intelligentassistance.com/prEdit/workflow.html
Their web site says, “prEdit also supports transcripts coming from the Adobe speech analysis feature: either using the automatic transcription, with a script fed to Premiere Pro or Soundbooth, or using a transcript through an Adobe Story > Premiere Pro CS5 workflow, that preserves names and punctuation.”
In an email from Philip Hodgetts, president of Intelligent Assistance, he said, “As long as there are timecode stamps in some format or another we can use it. We’ve tested agains all available formats. Just note that with the ‘standard transcript” we interpolate time values between the time stamps, so there is a little “slip” in the edits. YOu can correct that in prEdit or trim in FCP after. The 3playmedia.com transcripts are frame accurate in the edits because they provide time stamps for each word, same as what we get from Adobe.”
So this sounds like there is more capability in the Adobe format than is being exposed in their GUI. Maybe they have an API that would give us frame accurate time codes for every word.
I love the idea of using prEdit to do the first cut of a doc in text form and have the video conformed to match automatically.
I think the next time I tackle a long doc I’m going to spring for prEdit.
Posted by Rob on 06/02 at 10:54 AM
I don’t know if there’s something more embeded in the Adobe transcription or not. You certainly don’t get any timecode notes when copy/pasting out of Premiere Pro.
Philip does have a lot more knowledge on this due to prEdit (which I have seen and is linked to in the article) so maybe he’ll chime in.
Posted by Scott Simmons on 06/02 at 11:52 AM
Unfortunately my writing has not been clear enough. From Adobe we get a time stamp for every word in milliseconds from the head of the file, as an XMP metadata dump from the file itself. That’s if the media file has been processed through the Adobe world. For prEdit 1 - 1.4 that was the only way that prEdit worked.
With prEdit 1.5 we *additionally* support any industry standard transcript format that has timecode stamps per paragraph or so. That can be TXT, DOC or RTF and the transcript file is named to match its media file, and put in the same folder as the media. We parse that file and assign nominal time stamps for each word, but it’s not as accurate as working through Story and PPro. But it requires no additional processing or work.
With prEdit 1.5 we also *additinally* support 3playmedia.com’s JSON format, which also provides us with a time stamp per word. JSON files are - like transcripts - placed in the same location as the media, named to match the media file. These JSON files are as accurate as the Adobe workflow but with the simplicity of the transcript workflow.
There is no way to get timecode stamped transcript files out of PPro or Soundbooth and I’m sorry for the confusion. With our Transcriptize software you can take the Adobe transcription (manual or automatic) and import to FCP’s markers.
Philip
Posted by Philip Hodgetts on 06/03 at 11:24 AM
It seems obvious, but voice transcription programs will never be made human-proof.
First of all, the same pronunciation can apply to different words, and transcription software can only make a best guess which word is applicable.
Second, until human beings can learn to speak clearly, they’ll never be able to takie full advantage of transcription software.
I work on a TV news program with closed-captioning that uses an automated transcription service. It’s hilarious to listen to the anchors, then see the mangled transcriptions of what they’re saying in the crawl below.
Posted by Jay Congdon on 06/09 at 04:03 PM
Philip does have a lot more knowledge on this due to prEdit (which I have seen and is linked to in the article) so maybe he’ll chime in.
Posted by jasonwoodler on 07/04 at 02:46 AM
it’s only a matter of time before voice transcription becomes close to fool proof. It will never be perfect, but the human transcriptions I’ve had done are also not perfect.
The people to watch in this space are Nuance - who have the best publicly available technology, and who Apple are reportedly doing a licensing deal with for (at least) iOS - and Google who’s (no finished) 411 experiment gave them billions of spoken samples for analysis, where the meaning is clear.
3PlayMedia.com also have an interesting approach - combines computer analysis with human correction.
Posted by Philip Hodgetts on 07/04 at 09:15 AM