Skip to content

Latest commit

 

History

History
50 lines (34 loc) · 4.13 KB

README.md

File metadata and controls

50 lines (34 loc) · 4.13 KB

Data Wrangling in Python: Introduction to the pandas library

This beginner-to-intermediate level workshop will introduce you to the pandas library, a popular Python library for data cleaning, data wrangling, and data analysis. Participants in this interactive class will use Jupyter Notebooks software and Python code to import, understand, and prepare a dataset for further analysis or visualization. By the end of this workshop, participants will be able to:

  • Identify and use the two primary data structures of the pandas library: Series and DataFrame
  • Implement functions from the pandas library to explore and analyze a dataset, including:
    • Handling missing data
    • Filtering and sorting data
    • Grouping data
    • Calculating basic summary statistics
  • Find documentation for the pandas library to troubleshoot errors and apply new functions to analyze a dataset

Prerequisites: Participants should be familiar with basic programming concepts, including variable assignment, data types, function calls, and installing packages or libraries. Introductory experience in Python or R will be especially helpful for this workshop.

JHU Data Services

Website: dataservices.library.jhu.edu/
Contact us: [email protected]
JHU Data Services, part of the Johns Hopkins University Sheridan Libraries, helps the JHU community find, use, visualize, manage, and share data. We offer live webinars and self-paced online trainings on computational research and coding, GIS, data management, data visualization, and more. See all of our training topics on our website.

This repository contains materials for one of our live webinars open to JHU students, faculty, and staff. Please contact us with any questions.

As of March 2020, Data Services workshops are being held virtually on Zoom. See our calendar to register for upcoming workshops.

Pre-Class Instructions

Before the class, follow the Python Installation Instructions to download Anaconda and Jupyter Notebooks onto your computer. Use our Jupyter Notebook Tutorial to learn the basics of opening and running Jupyter Notebooks. You can view the Jupyter Notebook Tutorial online here.

Description of Files

  • Data: This folder contains raw data files to be used during hands-on activities in the workshop
  • In-ClassScripts: This folder contains Jupyter Notebook file you will need for the workshop:
    • DataWranglingPandas_StudentVersion.ipynb
    • Images folder - contains images that are used in the Juptyer Notebook
  • PresentationMaterials: This folder contains PowerPoint slides and other presentation materials used in the workshop (there are none for this workshop)
  • Resources: This folder contains cheatsheets to assist you during the workshop and links to external sources for you to continue your learning

Post-Class Survey

If you have taken the live webinar for this class, please take this survey: https://www.surveymonkey.com/r/IntroPandas

License and Terms of Use

The presentation materials are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0), attributable to Data Services, Johns Hopkins University.

See LICENSE file for code licensing and re-use information.

The images, external resources, and cheatsheets linked in this repository may have other licenses and terms of use.

Citation

Please cite this material as:
Johns Hopkins University Data Services. [Date of workshop]. [Workshop title]. [GitHub URL]