Turning a Directory of Texts into a List of Strings

For almost everything I do in text analytics, I find myself with a directory of texts which, in most instances, need to be turned into a list of strings, with each text its own item in the list. Here’s my Python boilerplate:

import glob

file_list = glob.glob('../texts' + '/*.txt')

mytexts = []
for filename in file_list:
    with open(filename, 'r', encoding='utf-8') as f:
        mytexts.append(f.read().replace('\n', ' '))

You can double-check your work by simply calling up any given text, using mytexts[1] with the “1” being any number you want, remembering that Python starts counting at 0 and not 1, so your list of 12 texts, for example, will be 0-11.

And if you need to mush all those texts back into a single string:

alltexts = ''.join(mytexts)

Leave a Reply