Transcription in 2018

The range of transcription options has opened up considerably since I last considered the possibility of turning over some, but not all, transcription to software. It appears to be largely done in the cloud, with offerings from the following:

  • Transcribe appears to be simply an on-line version of the mechanical transcription machines I used to use: load the audio and then type. The “automagic” version allows you to listen to the audio through a headset and then dictate it to the site, which will then transcribe. That’s interesting.
  • f4transkript is another on-line service where you load your audio and then you do the typing.

If you’re interested in these traditional forms of transcription, wherein you do the typing, then may I also suggest you check out the transcription options in Scrivener. It’s not a service, so you just buy the software and use it. And a license is very inexpensive.

For those interested in letting an AI of some kind transcribe the audio for you — ah, the future, then there appears to be Descript. It appears to be the case that you upload your files either online, or you simply load them into an app installed on your local machine: it’s not quite clear if you pursue the latter course if the transcription takes place entirely on your machine or if the AI that does the heavy lifting lives in the cloud. The demos appear to work in real time, but the site suggests that perhaps you can load an audio of whatever length as a digital file and in less time than it takes to play it, you can have a transcript back.

I’m going to see how much you can do with a free account and report back. This could be very, very useful. (And cool!)

You don’t have to do much, apparently, with GarageBand on a MacBook Air to get the fans to kick on. I was simply recording my daughter doing a bit of audio theater and it wasn’t long before my Mac was sounding like a regional jet. My PreSonus USB box came with Studio One software. Maybe I should check out the [tutorials][] some time?


Museum of Endangered Sounds

[The Museum of Endangered Sounds][mes] is a great deal of fun, but only if you are of a certain age. (Hint: Sounds include modems pinging and dot matrix printers buzzing among other things.)


The Lives of Harry Lime

[The Lives of Harry Lime][hl] is available on Produced by the BBC, I believe, Orson Wells reprised the role of Harry Lime from his 1949 adaptation of Graham Greene’s novel _The Third Man_. The film is a personal favorite of mine — along with _The Gray Fox_ and _Local Hero_ — and I look forward to spending some time with the audio series: 52 half-hour episodes are available.

One of the things I want to do after the book is done is get together with some talented people and write some audio plays. I think it’s a genre that is overdue for a renaissance, and I think it would be fun to work as a group, perhaps writing alone but then coming together to read the texts live, record them, and release them on the web. Who am I thinking of? Josh Caffery, for a start, as one of the most talented writers I know, but also Reese Fuller, who I know less well, but has to have chops. Perhaps Conni Castille would be interested. Very talented, but I am not sure if she is interested in writing outside of film. Kristi Guillory always amazes me, but again, it may be she is more a song writer than a story writer. I’m sure there are others, but these are the people that come to me in the moment.


The folks at Rogue Amoeba have a [nice write-up][1] on the design process for the UI and icon of their latest application, [Piezo][2]. I’m bookmarking it because I’m thinking about trying my hand at iPhone application development: I want an app for my iPhone that lets me record in the field. They have apps that let you do this, but wouldn’t it be nice if the app also prompted you for some basic metadata or made metadata like GPS coordinates, easy?


Getting Audio from the Kitchen Computer Back to the Stereo

I spent the weekend trying to discover what my options are for getting audio from the kitchen computer to the stereo in the living room. There are two ways to do this: wired and wireless. I have a wireless solution already, a Bluetooth receiver, that I bought for use in my study, but the sound was fairly unappealing and, worse, the connection seemed flaky. The wired solution would be ideal, because there is a Cat 5e cable already running from the living room equipment cabinet, for lack of a better phrase to describe where the television, the DVD player, the stereo, the NAS, and the Airport Extreme (our router) all sit and the Mac Mini in the kitchen.

What I think I want is a device that can sit on the network and that iTunes will recognize as a legitimate receiver. That seems like easiest way to do this. There are a variety of protocols for streaming audio over the internet, but I don’t know any of them and I would’t know where to begin with a home-grown solution.

Off the shelf it is, then, and an obvious place to start would be one of Apple’s devices, because we have already invested in that particular platform, or set of platforms.

What we have used, on occasion, up to this point is a beloved 12-inch PowerBook, which doesn’t require a wired network connection, because I think the ethernet port is broken on it, but it does require being woken up to work and it seems like a fairly dumb use for a much more flexible machine, a machine we would rather have our daughter using for her homework.

We also have an Airport Express we purchased when we lived in our old house so that we could bridge our wireless network into the backyard so we would work while Lily played. (Our old house was built in 1956 and featured amazingly solid construction with brick on the front and masonite siding on the other three sides. I think our problem was that there was an early form of foil wrap used under the siding. We had to place the Airport Express on a window—the window we put in the kitchen—in order to get a signal out of the house.) We were delighted to discover that we did not need the Express in the new house, and so we have reserved it mostly for traveling. The Express will sit on a network and it will act as an audio receiver, but I don’t think I want it sitting in the same cabinet as the Extreme. That seems to me to be asking for trouble. The advantage of the Express is that it already paid for and it offers analog audio out in the form a stereo mini plug.

A second alternative would be to invest in an Apple TV, which at $99 is neither an expensive nor a cheap solution. Unfortunately, aTV’s audio output is optical, and our rather aged Sony receiver is both analog and mechanical — that is, it’s RCA all the way. We can buy a box to convert digital to analog, but that’s another $25+. (When we win the lottery, however, we will look forward to upgrading our entire audio-video infrastructure and none of this will pose a problem.)

But what about a device that would play well with AirPlay, like the Airport Express or Apple TV? Almost everything on [this list][list] is either a speaker or a receiver, with the only devices meant to sit between the computer and a receiver, with speakers, being the aTV and the AE.


Identifying Ambient Noise

Stephen Tarzia has developed an iPhone app, BatPhone, which allows you to locate someone by sound. He also has a demonstration of our ability to identify ambient noise on his website. [Check it out.](

Audio Digitization for the “Oral Histories of the American South” Project

The description of the audio digitization process given below is from 2007, but it is a model of thoroughness of description:

Cassettes are played on a Nakamichi MR-1 discrete head professional cassette deck. Tape heads are cleaned before each side of the cassette is played and the azimuth (the angle between the tape heads and tape medium) is adjusted to create maximum contact between the playback head and the tape to ensure the widest frequency response. Playback equalization is set to 120µ seconds for IEC standard Type I cassettes, and 70µS for Type II and Type IV cassettes.

XLR outputs of the Nakamichi transmit the balanced signal directly to the Apogee Rosetta 200, 24 bit, 2 channel, Analog to Digital and Digital to Analog converter. The signal is digitized at a sample rate of 96 kHz and 24 bit sample depth and travels to the computer via an XLR cable from the digital outputs to the Lynx One sound card AES/EBU audio port.

Recently, we added the Apogee Big Ben Master Digital Clock, a master word clock that virtually eliminates any possible jitter [abrupt and unwanted variation of one or more signal characteristics] that can cause high frequency distortions in the signal. This process creates audio files with excellent clarity and a very large quantity of information. A typical file, representing one side of one cassette, comprises around 1 GB of data.

Files are then stored in a designated digital deep storage on the libraries archival servers as they are too large to be stored on CD without converting the sample rate to 44.1 kHz and reducing the quality.

The signal can be monitored from each source separately from Genelec 8030A bi-amplified monitors routed through a Coleman Audio MS6A switcher with monitor controller. The switcher has balanced XLR inputs and outputs to preserve signal-to-noise ratio and features completely passive switching. Interviews are re-recorded using Wavelab, a non-linear digital audio software platform.

Each cassette side is recorded, assigned a number as a preservation master (PM), entered into a PM database including pertinent metadata, and saved as a single audio file into deep storage in a dedicated digital archive maintained by UNC. The interview audio file is then converted into a file for burning a CD listening copy for in-house library patron research. First the file is resampled to 44.1 kHz and 24 bit sample rate for audio processing. The audio file is processed in Sound Forge version 8.0 with Waves X Restoration, VST, Direct X, and Sony audio plug-ins to improve the quality.

A typical file requires two processes: normalization to an average RMS (root mean square) level of -14 dB applying dynamic compression in order to increase the volume, and noise reduction to remove as much background noise, tape hiss, and rumble as possible without affecting the source material. Some files require more specific equalization or a series of noise reduction to achieve audio of suitable quality and volume for researchers. The file is then converted to 16 bit samples, burned to a CD listening copy on a professional grade Mitsui gold audio CD at 4x speed with a Plextor DVDR PX-716A 1.09 drive using Sony CD Architect software, version 5.2. CDs are tested to determine audio is present. Finally, all individual audio files that comprise a complete interview are arranged in order and converted to one single 256 Kbps, 44.1 kHz, 16-bit, stereo MP3 audio file for the Documenting the American South, Oral Histories of the American South collection interface.

The above description accurately describes the current digitization process, a set of practices resulting from regular evaluation of current digitization standards and our abilities to meet and surpass them with the equipment and time we have available. When audio digitization of the interviews began November 1, 2005, masters were recorded at a 44.1 kHz, 16 bit sample rate. Soon, we hope to replace the LynxOne sound card with a FireWire card, removing another gain structure from the signal chain to create digital preservation masters with the least amount of information lost, or added, as possible.

To Be Human Is To Vary

A recent trip through old podcasts brought me back to this great interview by [David Battino]( with Peter Drescher, a sound designer who has created some remarkable music that all of us have heard: he’s the guy who makes the default ringtones for various mobile phone manufacturers.

That sounds immediately boring and mechanical and, well, corporate, but he takes his job seriously and all those labels that we are so quick to apply are things he himself knows. His Sisyphian task results in some interesting observations about what makes sound interesting to us, especially musical sounds. One of the things he reminds us is that the kind of ready repetition of music with which we are all now not only familiar but sometimes dependent — that is, recorded music — is really [a rather recent phenomena]( (The link is to a piece by Peter Drescher entitled “The Myth of Music Ownership.)

Even within recorded music, however, the human mind between the ears seeks variation. Check it out. It’s short and full of great examples: [Peter Drescher on Annoying Audio]( — link is to MP3. (I had an embedded QT player, but I couldn’t get it not to pre-load the audio.)

Equipment Guide for Podcasting

Over at HiveLogic, Dan Benjamin has a great [guide on Podcasting Equipment][hl], which has been updated for 2009. He distinguishes between four types of users: beginner, entry, mid-range, and prosumer. (Okay, that isn’t a very coherent typology, but the scenarios he provides for each are clear enough to be helpful to anyone curious.)