Add Line Numbers to a Text File

I am working on building a list of character states for a possible phylogenetic analysis of this small corpus of legends I have. As I go through the legends and add character states, essentially plot elements (a lot like motifs), I am writing lines like this:

Spirit Controller: [0] absent [2] present
Spirit: [0] absent [1] pirate [2] bull [3] dog [4] wind

Each state/motif needs to be numbered (eventually) so that each text can be described using this list. As I work through the texts, however, I find I am regularly re-ordering the list. At first, I was re-ordering by hand, but I (eventually) realized that I could leave the numbering for later, once I had arrived at a satisfactory sequence.

There are a couple of options for numbering the lines in a text file: cat, nl, and awk, with nl being expressly for the purpose. First, nl:

nl -ba -s ': ' filename > filenamenumbered

Then cat:

cat -n file > file_new

And, finally, awk:

awk '{printf("%5d : %s\n", NR,$0)}' filename > filenamenumbered

csplit < awk

I regularly need to split larger text files into smaller text files, or chunks, in order to do some kind of text analysis/mining. I know I could write a Python script that would do this, but that often involves a lot more scripting than I want, and I’m lazy, and there’s also this thing called csplit which should do the trick. I’ve just never mastered it. Until now.

Okay, so I want to split a text file I’ll call excession.txt (because I like me some Banks). Let’s start building the csplit line:

csplit -f excession excession.txt 'Culture 5' '{*}'

… Apparently I still haven’t mastered it. But this bit of awk worked right away:

awk '/Culture 5 - Excession/{filename=NR"excession"}; {print >filename}' excession.txt

For the record, I’m interested in working with the Culture novels of Iain M. Banks. I am converting MOBI files into EPUBs using Calibre, and then into plain text files. No, I cannot make these available to anyone, so please don’t ask.

The Culture series:

  1. Consider, Phlebas (1987)
  2. The Player of Games (1988)
  3. Use of Weapons (1990)
  4. The State of the Art (1991)
  5. Excession (1996)
  6. Inversions (1998)
  7. Look to Windward (2000)
  8. Matter (2008)
  9. Surface Detail (2010)
  10. Hydrogen Sonata (2012)

Concatenate Text Files with File Names

Last night I needed to compile a folder (directory) of text files into a single file with the file name as a header. This simple `bash` script did the work:

% for f in *.txt; do echo “# $f”; cat “$f”; done > ../legends.txt

The hash sign ahead of the filename reveals that I compiled the document as a markdown text. I couldn’t quite figure out how to insert newlines into the script above, so I ended up using some regex to do that: finding instances of the hash tag and inserting a new line before it and then finding instances of .txt and inserting a newline after it. And then, finally, removing the `.txt` extension altogether. From there, I converted the document to HTML that I could format more clearly.

As seen here, I entered the script directly at the command line, but it could also, I suppose be saved thus:

#! /usr/bin/env bash

for f in *.txt;
do echo “# $f”;
cat “$f”;
done

I’m not sure how to direct the output into a file within a `bash` script. I usually just do that at the command line. (I know, I know: I need to learn bash scripting. I’ll get there.)

I am finally making progress on learning to code. I don’t know that I am ready to do any complicated natural language processing in Python just yet, but the basics are finally making sense to me. A lot of my education comes in ten minutes I grab here or twenty minutes I grab there to learn about `while` loops or what a `method` is. I am keeping all my notes and my code in a sub-sub-subdirectory. It’s long to type. Thank goodness `bash` is so customizable:

alias Learn=’cd Dropbox/personal/programming/learn’

Much easier.

I decided to use an initial capital as a way to distinguish my alias from a regular bash command. The alias depends upon being in the home (~) directory to work, which is usually where I begin, but if my work in the `jstor` directory begins to pick up, I can easily add a `~/` to the beginning of the directory structure above to make it possible to use the command from wherever I am.