This repository contains the code and files used and created by the Duke University Data+ 2021 Rubenstein Library Card Catalog Team. Working with the digitized cards from the David M. Rubenstein Rare Book and Manuscript Library's physical card catalogs, our team explored the files as a way to further the library's initiative of finding and describing historically marginalized voices in their collections. We have uploaded the cards to Duke's Internet Archive Page for easy viewing for librarians and patrons alike.
We also created a structured dataset, sorted by collection of items within the catalog, named main_file_dataset.csv. Using natural language processing and some manual editing, we pulled out important metadata such as author, location, and date written and added links in the dataset to the corresponding card in the Internet Archives site. This dataset will be uploaded to the Duke Research Data Repository to allow access for those who wish to dig deeper into the files.
With the dataset we created, we analysed what and who is present in these cards. We explored the demographics of the authors and items cataloged, as well as analysed how the information within the cards relates to the history of Duke University. We completed spatial frequency mapping on the level of the United States and of North Carolina counties, in addition to visualizing the international countries present in the cards. This analysis has been compiled into a web app for easy access. There is copious rich information present in the files, and our Data+ project is just the tip of the iceberg. We hope that future research teams will continue to dissect the card files and continue to gain insights into Duke's history.
While we were able to create a fairly comprehensive dataset containing all of the digitized cards, we were limited by our OCR software and data cleaning techniques. We have manually gone through the dataset to correct OCR errors in the authors' names; however, there are still many incomplete location or date cells, as well as some completly blank rows that the OCR did not pick up. Our first reccomendation, should another team continue this research, would be to manually correct some of the data which we were unable to correct due to time constrainsts and update our analysis which relies upon said data.
An avenue of analysis that we were, unfortunately, unable to explore was sentiment analysis surrounding various groups in the catalog (e.g., the southern gentleman, slaves, southern belles). We would reccomend that future researchers analyze how these and other groups are represented and discussed in the catalog. In addition, the identification of "outdated language" in the cards would prove helpful.
We were able to explore Duke's history in relation to its presidents, buildings, and early names; other topics to look into with regards to the university could be the historical ties to Methodism, the relationship with UNC, and the history of minority students (POC, women, international, etc.). Beyond the university, exploring major events such as the Civil War, slavery, and activism in North Carolina could be interesting as well.