Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions guides/python/pandas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
pandas: powerful Python data analysis toolkit
What is it?

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.
Table of Contents

Main Features
Where to get it
Dependencies
Installation from sources
License
Documentation
Background
Getting Help
Discussion and Development
Contributing to pandas

Main Features

Here are just a few of the things that pandas does well:

Easy handling of missing data (represented as NaN, NA, or NaT) in floating point as well as non-floating point data
Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labeling of axes (possible to have multiple labels per tick)
Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 format
Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging

Where to get it

The source code is currently hosted on GitHub at: https://github.com/pandas-dev/pandas

Binary installers for the latest released version are available at the Python Package Index (PyPI) and on Conda.

# conda
conda install -c conda-forge pandas

# or PyPI
pip install pandas

The list of changes to pandas between each release can be found here. For full details, see the commit logs at https://github.com/pandas-dev/pandas.
Dependencies

NumPy - Adds support for large, multi-dimensional arrays, matrices and high-level mathematical functions to operate on these arrays
python-dateutil - Provides powerful extensions to the standard datetime module
pytz - Brings the Olson tz database into Python which allows accurate and cross platform timezone calculations

See the full installation instructions for minimum supported versions of required, recommended and optional dependencies.
Installation from sources

To install pandas from source you need Cython in addition to the normal dependencies above. Cython can be installed from PyPI:

pip install cython

In the pandas directory (same one where you found this file after cloning the git repo), execute:

pip install .

or for installing in development mode:

python -m pip install -ve . --no-build-isolation --config-settings=editable-verbose=true
8 changes: 8 additions & 0 deletions python/pandas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Pandas

Pandas is a Python library used for data manipulation and analysis.

## Installation

```bash
pip install pandas