US Govt Fair Use Index Review

Overview

Author: Etienne P Jacquot
Date: February 25th, 2025
Course: Penn Carey Law: LAW-9580 Cybercrime (Levy)
Statement of Academic Integrity: I have adhered to the academic integrity guidelines of the Code of Academic Integrity in this assignment. All of the work included (written and code), is my own and is not representative or indicative of any opinions or views of the University of Pennsylvania, the Penn Carey Law School, the Annenberg School for Communication, or any other entities or individuals associated with UPenn.
License & Liability: MIT License

Getting Started

Dataset Information

The Fair Use Index dataset included in this repository was copied manually from the web pages of the US Govt Copyright Office Fair Use Index on February 24th, 2025.

Excel data file:
- USGOVT_FairUseIndex_02-24-2025.xlsx
Consists of 6 Columns available for 250 rows:
- Index(['Case', 'Year', 'Court', 'Jurisdiction', 'Categories', 'Outcome'], dtype='object')

Introduction & Discussion

This analysis is performed in a Jupyter Notebook with Python 3.11, using the Fair Use Index dataset to understand the distribution of the US Govt Copyright Office referenced cases provided as reference for the public.

Focusing on Time, Outcomes, and Categories, the scope of this analysis focuses on Education & Research given my near experience as working as an IT Staff member for nearly 10 years at the Annenberg School for Communication at the University of Pennsylvania, in support of my graduate course work for LAW-9580 Cybercrime at Penn Carey Law.

Steps taken in this coding analysis are primarily aimed at cleaning the data for inconsistencies and visualizing the cleaned data for statistics on trends in the cases referenced in the Fair Use Index dataset.

Disclaimer on Usage of Artificial Intelligence

Please be advised that the following AI tools were used in support of this course work, consistent with Penn Carey Law's guidance & policies on AI outlined here: https://www.law.upenn.edu/its/docs/ai/

Github: CoPilot VSCode extension was used to assist in writing and auto-generating Python code snippets throughout the Jupyter notebook analysis & scripting, along with formatting this README documentation.
OpenAI: GPT-4o AI model is used to assist in filtering all categories to differentiate parent categories from subcategories.
Anthropic: Claude Sonnet 3.7 model was used to assist in generating the advanced reference on data visualizations generated in this analysis.

Development

Code Dependencies

Using Python 3.11, run the following to replicate the code in a local virtual environment

For the analysis.ipynb notebook, make sure to install the dependencies in requirements.txt file:

python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

OpenAI API Access Key

If using OpenAI to make API calls, you must save your API key in ./secret_key.json file.

Cleaned Dataset Export for Reproducibility

An export of the cleaned dataset is provided for reference:

fui_cleaned.xlsx

Data Visualizations

Below are image exports of generated data visualizations from the analysis of the Fair Use Index dataset, of which a subset of these graphs are included in my official course work submission paper.

Total Cases per Year:

Total Cases per Decade:

Main Outcomes per Decade:

Categories per Decade:

Categories per Year:

AI Summarized Parent Categories per Year:

Notably, AI summarizes all available categories in the dataset into 5 or 6 parent level categories

Education & Research

Journalism & Commentary

Legal & Governance

Media & Entertainment

Technology & Digital

Visual Arts

AI Summarized Parent Categories per Decade:

Main Outcomes per AI Summarized Parent Categories per Decade:

Education & Research

Digital & Technology

Visual Arts

Media & Entertainment

PDF Document Review

Using the pdf-download.sh script, download each PDF from the US Govt Fair Use Index website for review

$ ./pdf-download.sh

The PDFs are stored in the ./pdfs/ directory! Manually inspect to confirm all files are valid and not corrupted. If any are corrupted, manually delete those files and re-run the script to download the PDFs again.
This [./pdfs/] directory will be used as a RAG KnowledgeBase for informing an AI model on the Fair Use Index cases for future analysis. The example will use Open-WebUI, TBD

docker run open-webui ....

Conclusion

For questions, comments, concerns, or feedback, please feel free to reach out to me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
open-webui-fair-use-index		open-webui-fair-use-index
static/images		static/images
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
USGOVT_FairUseIndex_02-24-2025.xlsx		USGOVT_FairUseIndex_02-24-2025.xlsx
USGOVT_FairUseIndex_03-05-2025.xlsx		USGOVT_FairUseIndex_03-05-2025.xlsx
analysis.ipynb		analysis.ipynb
doc_review.ipynb		doc_review.ipynb
fui_cleaned.xlsx		fui_cleaned.xlsx
pdf-download.sh		pdf-download.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

US Govt Fair Use Index Review

Overview

Getting Started

Dataset Information

Introduction & Discussion

Disclaimer on Usage of Artificial Intelligence

Development

Code Dependencies

OpenAI API Access Key

Cleaned Dataset Export for Reproducibility

Data Visualizations

Total Cases per Year:

Total Cases per Decade:

Main Outcomes per Decade:

Categories per Decade:

Categories per Year:

AI Summarized Parent Categories per Year:

AI Summarized Parent Categories per Decade:

Main Outcomes per AI Summarized Parent Categories per Decade:

Education & Research

Digital & Technology

Visual Arts

Media & Entertainment

PDF Document Review

Conclusion

References

About

Languages

License

atnjqt/fair-use-index

Folders and files

Latest commit

History

Repository files navigation

US Govt Fair Use Index Review

Overview

Getting Started

Dataset Information

Introduction & Discussion

Disclaimer on Usage of Artificial Intelligence

Development

Code Dependencies

OpenAI API Access Key

Cleaned Dataset Export for Reproducibility

Data Visualizations

Total Cases per Year:

Total Cases per Decade:

Main Outcomes per Decade:

Categories per Decade:

Categories per Year:

AI Summarized Parent Categories per Year:

AI Summarized Parent Categories per Decade:

Main Outcomes per AI Summarized Parent Categories per Decade:

Education & Research

Digital & Technology

Visual Arts

Media & Entertainment

PDF Document Review

Conclusion

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages