Before we get into the nitty-gritty, let’s go over some of the basics of PhraseFind.

*Update April 12, 2011: Avid has given some feedback on the blog post, and I have posted their comments inline in bold.Thanks Avid!

PhraseFind is a phonetic matching algorithm, developed by Nexidia. It’s been wildly successful, serving as the basis of AV3’s “Get” software, which is used in conjunction with Final Cut Pro. Avid has licensed the technology and has branded it “PhraseFind”. It is available for $495, or $1295 when bundled with ScriptSync. Additional language packs can be purchased for $149.

“There are some differences between GET and PhraseFind:

-GET is a separate app that you have to leave the NLE to get to; PhraseFind is built right into Media Composer and integrates with the find tool, giving you even stronger search refinement
-Final Cut Pro can’t have any open dialogs for get to work
-results have to be imported into the NLE; with PhraseFind you click and edit; its part of the editing process
-indexing is manual unless you are using “all searchable media” option which is burdensome; PhraseFind indexing is automatic

This all translates into a much faster workflow with PhraseFind.”

Avid also adds:

“Nexidia develops the indexing engine – Avid develops PhraseFind and ScriptSync, AV3 develops get
The ability to search this way was pioneered with ScriptSync, and now PhraseFind offers the searching built into the Find tool without the need to spend on the extra scripting abilities in ScriptSync.”

See Avid’s indepth FAQ here: http://avid.custkb.com/avid/app/selfservice/search.jsp?DocId=389887&ssdFilterCommunity15=1408&ssdFilterCommunity10=369&ssdFilter_SearchKeyWord=&Hilite=#a10

Since this is the first incarnation of Avid integration, there are a few gotchas.

  1. PhraseFind is currently only supported with Avid Media Composer 5.5.1 (Mac or PC), Symphony 5.5.1 (Mac or PC) or Newscutter 9.5.1 (PC). Earlier versions are not supported.
  2. The database Phrasefind creates cannot currently be shared. Thus, every station accessing files must independently index the files. From Avid.com: In each Project folder, you will see a new folder called SearchData which contains a file called _SearchDB_. This file is the database created by Find when it indexes the bins within your project. If you have PhraseFind installed, you will also see a folder called PhoneticData which contains a number of files with a .pat extension; these files contain the phonetic information for each of the audio clips within your project. One .pat file will be created per clip.
  3. Linking via an AMA file or even importing does NOT cause PhaseFind to index the file. The Bin must be saved before PhraseFind begins .
  4. PhraseFind indexing also does not appear to be multithreaded. Routinely, one processor can be seen (barely) being taxed during indexing. “The benefit of this approach is that indexing doesn’t interfere with the performance of basic editing operations.”
  5. In my tests, anything Phrasefind search returns over 60% probability seems to usually be spot on. This decreases sharply as the audio fidelity and annunciation decreases.

    I used Barack Obama’s “A More Prefect Union” speech for accuracy testing. This offered an excellent testing opportunity, as I not only had the full speech, but also a word for word transcript of the speech to compare against. I searched for the word “Dreams”. The transcript from the speech told me the word “dreams” was used a total of 5 times. With PhraseFind, 15 hits were returned, all above 50%. The top 6 hits were ranked as 56% accurate or better. They all turned out to be correct, including a hit for the word “dream” (singular).

    Next, I tried the word “education”. It was spoken 6 times, and PhraseFind found all 6 instances at 85% or higher degree of certainty.

  6. Moving the files from one bin to another, as well as renaming the bin, causes PhraseFind to momentarily rewrite it’s database. It’s not re-indexing the files, it’s simply rewriting it’s database.
  7. If a clip is deleted from the project, the associated PhraseFind entry is also deleted.“Taking media offline does NOT remove the index entry for that clip, so that offline media can still be searched. (It might not be online, but you know where it is.)”
  8. Approx 13minutes = 1MB for a PhraseFind .pat file.
  9. Get used to NOT hitting “RETURN” on the keyboard after bringing up the SEARCH box. That searches just  text. You need to click on the PhraseFind button.
  10. Index time is impacted by AMA linking vs native DNxHD (MXF) files. This seems to vary wildly depending on the codec associated with the AMA file.

Examples:

Clip LengthDNxHD (MXF)AMA .mov
7:32:14:21
37:07:25:48
1:33:51:494:04

All test performed with:

Mac Pro 2 x 2.66GHz 6 Core Intel Xeon (Westmere)
8GB RAM
32Bit Kernel
OS 10.6.6
Media Composer 5.5.1
Phrasefind 1.0
3 Drive/ 6TB internal RAID 0 as Avid MediaFiles space