Counting Control Words in a Text

As I was working on a toy corpus to understand the various facets of skearn, I came across this very clear example of how to count specific words in a collection of texts:

import sklearn
cv = sklearn.feature_extraction.text.CountVectorizer(vocabulary=['hot', 'cold', 'old'])
data = cv.fit_transform(['pease porridge hot', 'pease porridge cold', 'pease porridge in the pot', 'nine days old']).toarray()
print(data)
[[1 0 0]
 [0 1 0]
 [0 0 0]
 [0 0 1]]

Please note that I’ve changed the original a bit to make it easier to deploy this is a longer script.

Leave a Reply