This course is designed to introduce students to Natural Language Processing (NLP) and its applications in academic research, data science, and industry. Students will learn how to use natural language processing techniques to gain a deeper understanding of a research question and/or topic.
- Integrated Science Center, room 1280. Tuesdays (T) and Thursdays (Th) 5:00 - 6:20
- See course in Blackboard
- email: jmtucker02@wm.edu
- website: https://jamesmtucker.com
- Office hours by appointment only
This course is language agnostic. You can submit your homework and project in whatever programming language you prefer. In class lectures, we will use Python, R, or Mojo for the most part.
Students are encouraged to utilize generative AI tools when they are stuck on implementing code or ideating on topics for projects. If you copy code from a generative AI tool, like ChatGPT or GitHub CoPilot, please provide a comment in your programming language's comment syntax that you resourced a tool to solve your problem. If you resourced Stack Overflow or another source, please footnote your sources accordingly.
- Understand the basics of natural language processing techniques and how they can be used to in data driven decision models
- Learn how to use natural language processing tools and libraries to perform tasks such as text classification, sentiment analysis, and text generation
- Develop the ability and experience to design and implement natural language processing systems for real-world applications
- Explore ethical and social implications of natural language processing and artificial intelligence
- Introduction to natural language processing
- Data set creation and documentation
- Text preprocessing and cleaning
- Text classification and sentiment analysis
- Neural Networks, Transformers, Large Language Models
- Ethical and social implications of natural language processing and artificial intelligence
- Jurafsky, Dan and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Online: https://web.stanford.edu/~jurafsky/slp3/. PDF
- Arcila Calderon, Carlos, et al. Computational Analysis of Communication. United Kingdom, Wiley, 2022. Google Books
- Tunstall, Lewis, Leandro von Werra, and Thomas Wolf. Natural Language Processing with Transformers O'Reilly Media, 2022. Google Books
There are a possible 165 points to earn in this course.
All assignments are posted GitHub Repo and links are provided in Blackboard.
-
Understanding NLP (10pts) - A conversation with ChatGPT about Linguistics, NLP, and Data Science
-
Course reading (15pts) - Please read the required readings listed in the course syllabus. Lecture topics will parallel the reading and the readings will help you comprehend the subject matter in greater detail.
-
NLP Problem sets (100pts) - these assignments are designed to reinforce the ideas discussed in lecture or the assigned reading. Whereas the student can expect to solve the problem sets from the topics covered in lecture, it is encouraged that the student explore additional solutions and creative thinking.
- Problem set 0: Exploratory Data Analysis in a NLP context - Distributions and Features
- Problem set 1: Semantic Search Engine - king is to queen as man is to ?
- Problem set 2: Author prediction Dataset (a) - Federalist Papers
- Problem set 3: Author prediction RNN and Transformer Models (b) - Federalist Papers
- Problem set 4: Fine-tuning models for task-specific objectives
-
Final Project (40pts) - Demonstrate your mastery of NLP concepts with a topic of your choosing
You get 4 days of late time. This means that if you cannot turn in an assignment on time, you can use your late time currency to extend a deadline by however many days are needed, not extending beyond 4. You get four days credit. Use the time wisely. If you need more than four days time, you can earn an additional day by completing a coding challenge:
date | day | topic | assignment | reading | academic calendar |
---|---|---|---|---|---|
2024-01-25 | Th. | Intro & syllabus | |||
2024-01-30 | Tu. | NLP, Data science, and GenAI | [ChatGPT: P] | Lockhart | |
2024-02-01 | Th. | Vectorized computation, Data structures | van Atteveldt et al. 5 6, 7 | ||
2024-02-06 | Tu. | Statistics and language | [ChatGPT: D] | Manning and Schütze | |
2024-02-08 | Th. | Documents as bags of words | Jurafsky & Martin 3, Grimmer 3 | ||
2024-02-13 | Tu. | N-gram language models | [Problem Set 0: P] | Jurafsky & Martin 4, 5 | |
2024-02-15 | Th. | N-gram langauge models | Jurafsky & Martin 4, 5 | ||
2024-02-20 | Tu. | Distributional semantics (Word2Vec, Doc2Vec) | [Problem Set 0: D] | Jurafsky & Martin 6 | |
2024-02-22 | Th. | Training a Word2Vec/Doc2Vec Model | Jurafsky & Martin 6, Gensim | ||
2024-02-27 | Tu. | Documents as sequences | [Problem Set 1: P] | Jurafsky & Martin 7 | |
2024-02-29 | Th. | Data modeling, cleaning, optimizing | Jurafsky & Martin 2 | ||
2024-03-05 | Tu. | Neural networks architecture | [Problem Set 1: D] | Jurafsky & Martin 7 | |
2024-03-07 | Th. | LLMs, Language, & Semantics | |||
2024-03-12 | Tu. | No class | Spring Break | ||
2024-03-14 | Th. | No class | Spring Break | ||
2024-03-19 | Tu. | Recurrent neural networks (RNNs) | Jurafsky & Martin 9 | ||
2024-03-21 | Th. | Distrib. semantics contextual embeddings | Jurafsky & Martin 10 | ||
2024-03-26 | Tu. | Evaluation of Language Models | Jurafsky & Martin 14 | ||
2024-03-28 | Th. | Transformer neural networks | |||
2024-04-02 | Tu. | Fine-tuning pretrained models | Jurafsky & Martin 10 | ||
2024-04-04 | Th. | Clustering and visualizing embeddings | Jurafsky & Martin 10 | ||
2024-04-09 | Tu. | Retrieval augmented generation | Jurafsky & Martin 11, Howard | ||
2024-04-11 | Th. | Information Extraction using NLP | Jurafsky & Martin 11, Howard | ||
2024-04-16 | Tu. | Inference using trained models | [Problem Set 3: P] | ||
2024-04-18 | Th. | Other Machine Learning techniques | |||
2024-04-23 | Tu. | LLMs and knowledge graphs & causal reasoning | |||
2024-04-25 | Th. | Project presentations | |||
2024-04-30 | Tu. | Project presentations | [Problem Set 3: D] | ||
2024-05-02 | Th. | Open discussion & project highlights | |||
2024-05-07 | Tu. | Final exam periods | |||
2024-05-09 | Th. | Final exam periods | |||
2024-05-13 | Tu. | FINAL PROJECTS DUE / COURSE READINGS DUE | Final exam periods | ||
2024-05-14 | Tu. | Final exam periods |
Please read and take notice of the following:
Mark | Mark | ||
---|---|---|---|
93 - 100 | A | 73 - 76 | C |
90 - 92 | A- | 70 - 72 | C- |
87 - 89 | B+ | 67 - 69 | D+ |
83 - 86 | B | 63 - 66 | D |
80 - 82 | B- | 60 - 62 | D- |
77 - 79 | C+ | 00 - 59 | F |
To appeal a grade, schedule a meeting to discuss it with me.
Course announcements will be posted in Blackboard. Please check Blackboard regularly for announcements.
The course readings, data sets, and code are available on the course GitHub repo.
If you are absent please email me and let me know or send me a text message. Course work is due as detailed in the course schedule. Late work is penalized 2% of the earned mark for every day it is late. If you are absent on a day that an assignment or project milestone is due, please make sure to turn it in early. If you are ill, please communicate with me regarding an extension.
William & Mary recognizes that students juggle different responsibilities and can face challenges that make learning difficult. There are many resources available at W&M to help students navigate emotional/psychological, physical/medical, material/accessibility concerns, including:
- The W&M Counseling Center at (757) 221-3620. Services are free and confidential.
- The W&M Health Center at (757) 221-4386.
- For additional support or resources & questions, Contact the Dean of Students at 757-221-2510.
date | academic event |
---|---|
2024-01-23 | Add/drop period begins at 1:00 pm |
2024-01-24 | First day of classes Non-degree seeking registration begins |
2024-02-02 | Last day to add/drop Deadline to file a minor declaration form for May or August graduates |
2024-02-03 | Withdrawal period begins |
2024-03-04 | Midterm grading begins |
2024-03-09 | Spring Break |
2024-03-10 | Spring Break |
2024-03-11 | Spring Break |
2024-03-12 | Spring Break |
2024-03-13 | Spring Break |
2024-03-14 | Spring Break |
2024-03-15 | Spring Break |
2024-03-16 | Spring Break |
2024-03-17 | Spring Break |
2024-03-18 | Classes resume after Spring Break |
2024-03-24 | Midterm grading ends at 11:59 p.m. |
2024-03-25 | Last day to withdraw from a full-term course |
2024-05-03 | Last day of classes |
2024-05-04 | Reading periods |
2024-05-05 | Reading periods |
2024-05-11 | Reading periods |
2024-05-12 | Reading periods |
2024-05-06 | Final exam periods |
2024-05-07 | Final exam periods |
2024-05-08 | Final exam periods |
2024-05-09 | Final exam periods |
2024-05-10 | Final exam periods |
2024-05-13 | Final exam periods |
2024-05-14 | Final exam periods |
2024-05-16 | Spring grades due by 9 a.m. for graduating students |
2024-05-16 | Spring Degree Conferral and Commencement Ceremony |
2024-05-17 | Spring Degree Conferral and Commencement Ceremony |
2024-05-18 | Spring Degree Conferral and Commencement Ceremony |
2024-05-21 | Spring grades due by 9 a.m. for continuing students |
Students are expected to conduct themselves according to the Honor Code.