FirstLines
FirstLines : Word Frequency Analysis of Poetry First Lines
FirstLines is a Python 3.x project presented as a Jupyter Notebook. Links to the code and inputs files as well as a link to the live Jupyter Notebook can be found at the bottom of the post.
This project was inspired by the book "Writing the Life Poetic" by Sage Cohen. Her book is filled with interesting ideas used to see things differently which have inspired creativity in my writing both poetry and prose. One of her suggestions is to use other person's poem titles as a jumping off points for your own ideas. She also has many other suggestions for creativity and I recommend her book. Writing the Life Poetic
This got me thinking about the process of writing. To me a poem or story title seems to be mostly an afterthought. What I was really interested in was the first line. The first line of a poem, the first line of a book. Anyone who has sat in front of a blank page, with an assignment due or project to be started understands that it is this beginning line that is often the hardest part.
This led me to looking at first lines, first lines of novels or for this project, first lines of poems. I decided this would be a good project to create another Jupyter Notebook and continue to practice my Python coding.
Program: FirstLines.py Python 3.6.2
Input: 6 poets, 225 poems randomly selected from each. A total of 1350 first lines.
Program Operation:
-- Read in 225 first lines from each poet.
-- Perform a word frequency analysis on the words used by each poet in their first lines.
Program Output: A CSV file for each poet showing the frequency analysis of words used in the first lines.
Analysis: The focus of this project was really the Python programming and the Jupyter but I also did some simple analysis.
Some options are available. Common English words can be removed by adding them to the list. Freq Limit; report or ignore words used infrequently. 0 = report all. Display the result output tot he screen or file (or both)
################################################
# firstlines.py Python 3.6.2 #
# 1) read poetry first line files #
# 2) turn each first line input file into a sorted frequency dictionary #
# >>>> All code released as open source with no usage restrictions #
################################################
# used to strip punctuation marks
import string
### ------ options ------ ###
RemoveCommon = True
# subset of most common words
CommonWords = ['the', 'to', 'of', 'and', 'a']
# optional freq limit, set to 0 to output all
optionalQty = 0
# optional output
GenerateScreenOutput = False
GenerateFileOutput = True
For this exercise the input filename and poet names are hardcoded.
# input file names
f_rossetti = "225 First Lines from Christina Georgina Rossetti.txt"
f_dickinson = "225 First Lines from Emily Dickinson.txt"
f_longfellow = "225 First Lines from Henry Wadsworth Longfellow.txt"
f_emerson = "225 First Lines from Ralph Waldo Emerson.txt"
f_teasdale = "225 First Lines from Sara Teasdale.txt"
f_whitman = "225 First Lines from Walt Whitman.txt"
# strings for output display
rossetti = "Christina George Rossetti"
dickinson = "Emily Dickinson"
longfellow = "Henry Wadsworth Longfellow"
emerson = "Ralph Waldo Emerson"
teasdale = "Sara Teasdale"
whitman = "Walt Whitman"
Funtions: the comments describes the actions
# read a file, return the contents in a string
def readInputFile(fileName):
f = open(fileName , 'r')
textInputString = f.read()
f.close
return textInputString
# takes a string, splits it and turns it into a list, returning the list
def stringToList(inputString):
inList = inputString.split()
return inList
# takes a list of words and turns it into a frequency dictionary, returning the dictionary
def listToFreqDict(inputList):
wordFreq = [inputList.count(p) for p in inputList]
return dict(zip(inputList,wordFreq))
# sorts a frequency dictionary in descending order
def sortFreqDict(inputFreqDict):
result = [(inputFreqDict[key], key) for key in inputFreqDict]
result.sort()
result.reverse()
return result
# convert the input list into a sorted freq dictionary
def getSortedFreqDict(inLst):
tmpFreqDict = listToFreqDict(inLst)
tmpSortedFreqDict = sortFreqDict(tmpFreqDict)
return tmpSortedFreqDict
Funtions: the comments describes the text clean up actions
# text processing is messy business, many times custom cleanup is required
# clean up the string and return a list
def CleanupInput(inStr):
# change strings to lower case
inStr = inStr.lower()
#clean up some dashes and apostrophes
inStr = inStr.replace("'", "")
inStr = inStr.replace("-", "")
# change string into a list
words = inStr.split()
#strip out punctuation
tempList = [w.strip( string.punctuation) for w in words]
#remove all the common words
cleanList = []
if RemoveCommon:
for item in tempList:
if not(item in CommonWords):
cleanList.append(item)
return cleanList
else:
return tempList
Funtions: the comments describes the screen and file out code
def generateScreenOutput(poetName, poetSortedDict):
print(poetName)
for item in poetSortedDict:
tmpstr = str(item)
#remove all python dictionary characters
tmpstr = tmpstr.replace("(", "")
tmpstr = tmpstr.replace(")", "")
tmpstr = tmpstr.replace("'", "")
tmpstr = tmpstr.replace('"', "")
#optional freq output
tmplst = tmpstr.split(',')
if ( int(tmplst[0]) >= optionalQty):
print (tmpstr)
print()
def generateFileOutput(poetName, poetSortedDict):
filename = 'firstline_frequency_' + poetName + '.csv'
f = open(filename, "w")
f.write(poetName + '\r\n' )
f.write('useage count' + ' , ' + 'word used' + '\r\n')
for item in poetSortedDict:
tmpstr = str(item)
#remove all python dictionary characters
tmpstr = tmpstr.replace("(", "")
tmpstr = tmpstr.replace(")", "")
tmpstr = tmpstr.replace("'", "")
tmpstr = tmpstr.replace('"', "")
#optional freq output
tmplst = tmpstr.split(',')
if ( int(tmplst[0]) >= optionalQty):
f.write(tmpstr +'\r\n')
f.close()
The main section of the code. Individual lists and dictionaries were hard-coded to make debugging easier. It would be easy to make all this code generic. This exercise will be left to the reader.
## read the files
s_rossetti = readInputFile(f_rossetti)
s_dickinson = readInputFile(f_dickinson)
s_longfellow = readInputFile(f_longfellow)
s_emerson = readInputFile(f_emerson)
s_teasdale = readInputFile(f_teasdale)
s_whitman = readInputFile(f_whitman )
## clean up the input and turn it into a list
l_rossetti = CleanupInput(s_rossetti)
l_dickinson = CleanupInput(s_dickinson)
l_longfellow = CleanupInput(s_longfellow)
l_emerson = CleanupInput(s_emerson)
l_teasdale = CleanupInput(s_teasdale)
l_whitman = CleanupInput(s_whitman)
## get the sorted freq dictionary for each poet
sfd_rossetti = getSortedFreqDict(l_rossetti)
sfd_dickinson = getSortedFreqDict(l_dickinson)
sfd_longfellow = getSortedFreqDict(l_longfellow)
sfd_emerson = getSortedFreqDict(l_emerson)
sfd_teasdale = getSortedFreqDict(l_teasdale)
sfd_whitman = getSortedFreqDict(l_whitman )
Output results code.
## print out to the console
if (GenerateScreenOutput):
generateScreenOutput( rossetti, sfd_rossetti)
generateScreenOutput( dickinson, sfd_dickinson)
generateScreenOutput( longfellow, sfd_longfellow)
generateScreenOutput( emerson, sfd_emerson)
generateScreenOutput( teasdale, sfd_teasdale)
generateScreenOutput( whitman, sfd_whitman)
#print out to a text file
if (GenerateFileOutput):
generateFileOutput( rossetti, sfd_rossetti)
generateFileOutput( dickinson, sfd_dickinson)
generateFileOutput( longfellow, sfd_longfellow)
generateFileOutput( emerson, sfd_emerson)
generateFileOutput( teasdale, sfd_teasdale)
generateFileOutput( whitman, sfd_whitman)
The output will scroll on the screen and be written as a CSV (comma separated value) file that can be opened in a spreadsheet.
A live viewer for the notebook can be found on the Jupyter Site All code, file and Jupyter notebooks are open source and can be used without permission.
Finally, I removed the 100 most common words, sorted the remaining in usage order and present them here in stanza form as my homage to Sage Cohen for her book: Writing the Life Poetic which I continue to enjoy.
FirstLines
Are love was night am long
Oh heart never said life thought
Sea thy old had summer thou sun
Little still sweet thee came
Far has song spring were dead through wind
Beauty heard let upon where alone earth gone
Heaven man once why bird
Dream eyes many tell woman before days
Death each face god king last light morning
Rose saw shall should sleep these today
While world again beautiful down here into land
Mine sat soul stars too went within
Cannot cold dear did dreamed first flowers garden
Look much sing snow
Tale times trees word yet April
Autumn born bring call city die
Evening every gave grass green hand
Hope hour joy leaves live lost may
Moon music must roses set south spirit
Though town woods young youth air
Always died done fair great high lies lord
Meet men more pale poor proud red sky
Soft stand stood strong such those us whom afraid
Against ago child children dark dawn deep door early end
Fire flower forgotten free friend gray house
Lie maiden morn nature own peace
Perfect place pleasant poet river road
Storm them thing waves weary whose wild wine years