Data+ Rubenstein Library Card Catalog

This repository contains the code and files used and created by the Duke University Data+ 2021 Rubenstein Library Card Catalog Team. Working with the digitized cards from the David M. Rubenstein Rare Book and Manuscript Library's physical card catalogs, our team explored the files as a way to further the library's initiative of finding and describing historically marginalized voices in their collections. We have uploaded the cards to Duke's Internet Archive Page for easy viewing for librarians and patrons alike.

We also created a structured dataset, sorted by collection of items within the catalog, named main_file_dataset.csv. Using natural language processing and some manual editing, we pulled out important metadata such as author, location, and date written and added links in the dataset to the corresponding card in the Internet Archives site. This dataset will be uploaded to the Duke Research Data Repository to allow access for those who wish to dig deeper into the files.

With the dataset we created, we analysed what and who is present in these cards. We explored the demographics of the authors and items cataloged, as well as analysed how the information within the cards relates to the history of Duke University. We completed spatial frequency mapping on the level of the United States and of North Carolina counties, in addition to visualizing the international countries present in the cards. This analysis has been compiled into a web app for easy access. There is copious rich information present in the files, and our Data+ project is just the tip of the iceberg. We hope that future research teams will continue to dissect the card files and continue to gain insights into Duke's history.

Please read the Project Overview document for a comprehensive outline of the files

Notes for Future Research

While we were able to create a fairly comprehensive dataset containing all of the digitized cards, we were limited by our OCR software and data cleaning techniques. We have manually gone through the dataset to correct OCR errors in the authors' names; however, there are still many incomplete location or date cells, as well as some completly blank rows that the OCR did not pick up. Our first reccomendation, should another team continue this research, would be to manually correct some of the data which we were unable to correct due to time constrainsts and update our analysis which relies upon said data.

An avenue of analysis that we were, unfortunately, unable to explore was sentiment analysis surrounding various groups in the catalog (e.g., the southern gentleman, slaves, southern belles). We would reccomend that future researchers analyze how these and other groups are represented and discussed in the catalog. In addition, the identification of "outdated language" in the cards would prove helpful.

We were able to explore Duke's history in relation to its presidents, buildings, and early names; other topics to look into with regards to the university could be the historical ties to Methodism, the relationship with UNC, and the history of minority students (POC, women, international, etc.). Beyond the university, exploring major events such as the Civil War, slavery, and activism in North Carolina could be interesting as well.

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
catalog		catalog
inital_data_wrangling		inital_data_wrangling
jupyter_pdfs		jupyter_pdfs
spatial		spatial
.gitignore		.gitignore
Project Overview.docx		Project Overview.docx
README.md		README.md
duke_history.ipynb		duke_history.ipynb
gender_demographics.ipynb		gender_demographics.ipynb
main_file_dataset.csv		main_file_dataset.csv
sort_by_collection.ipynb		sort_by_collection.ipynb
summary_characteristics.ipynb		summary_characteristics.ipynb
word_cloud.ipynb		word_cloud.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data+ Rubenstein Library Card Catalog

Please read the Project Overview document for a comprehensive outline of the files

Notes for Future Research

About

Releases

Packages

Contributors 2

Languages

duke-libraries/Data--Rubenstein-Library-Card-Catalog

Folders and files

Latest commit

History

Repository files navigation

Data+ Rubenstein Library Card Catalog

Please read the Project Overview document for a comprehensive outline of the files

Notes for Future Research

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages