csplit < awk

I regularly need to split larger text files into smaller text files, or chunks, in order to do some kind of text analysis/mining. I know I could write a Python script that would do this, but that often involves a lot more scripting than I want, and I’m lazy, and there’s also this thing called csplit which should do the trick. I’ve just never mastered it. Until now.

Okay, so I want to split a text file I’ll call excession.txt (because I like me some Banks). Let’s start building the csplit line:

csplit -f excession excession.txt 'Culture 5' '{*}'

… Apparently I still haven’t mastered it. But this bit of awk worked right away:

awk '/Culture 5 - Excession/{filename=NR"excession"}; {print >filename}' excession.txt

For the record, I’m interested in working with the Culture novels of Iain M. Banks. I am converting MOBI files into EPUBs using Calibre, and then into plain text files. No, I cannot make these available to anyone, so please don’t ask.

The Culture series:

  1. Consider, Phlebas (1987)
  2. The Player of Games (1988)
  3. Use of Weapons (1990)
  4. The State of the Art (1991)
  5. Excession (1996)
  6. Inversions (1998)
  7. Look to Windward (2000)
  8. Matter (2008)
  9. Surface Detail (2010)
  10. Hydrogen Sonata (2012)

Leave a Reply