Since my first class with Dr. David Holmes, I was greatly interested in the field of statistics he specialized in: stylometry, the statistical analysis of literary style. When Dr. Holmes shared with the class about the projects he has taken up and how he’s used this to attribute authorship to contested works, I knew this was a topic I needed to learn more about. During the summer of my senior year, Dr. Holmes kindly let me join his project of analyzing the true authorship of the works published under former Congressman David Crockett’s name. Evidence stacks against Crockett being the author of these works, given his lack of a formal education and historical accounts of ghostwriters.
We immediately started on gathering more evidence about the potential ghostwriters of the books and finding sources on their books. From there, we began to compile text files of all the books together. Each text chunk of about 4,000 words needed to be nicely organized and edited to ensure that the analysis would run smoothly. Since we collected all of those files and had all the necessary information together, we have begun to plan and run the analysis, the stage we are currently in. The analysis will be mostly relying on the identification of non-contextual words (such as “or”, “and”, “the), and computing how frequently each author uses these words. Since everyone uses non-contextual words at a unique frequency, we will see how these differ and compute the distances between them using different statistical measures.
This project has taught me how interlinked statistics can be with other fields of study and has inspired me to continue this work in different languages. Dr. Holmes and I plan to conduct a stylometric analysis on a heavily disputed Russian work, which will allow me to combine my passion for languages with the new skills I have developed learning about the wonders of stylometry.