Skip to content

Commit

Permalink
doc: update pg-python-data-tools.md
Browse files Browse the repository at this point in the history
  • Loading branch information
fanyang01 authored Dec 3, 2024
1 parent 533e246 commit eaafcc7
Showing 1 changed file with 20 additions and 13 deletions.
33 changes: 20 additions & 13 deletions docs/tutorial/pg-python-data-tools.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Tutorial: Accessing MyDuck Server with PostgreSQL using psycopg, pyarrow, and polars
# Tutorial: Accessing MyDuck Server with psycopg, pyarrow, and polars

## 0. Connecting to MyDuck Server using psycopg

`psycopg` is a popular PostgreSQL adapter for Python. Here is how you can connect to MyDuck Server using `psycopg`:

```python
import psycopg

with psycopg.connect("dbname=postgres user=postgres host=127.0.0.1 port=5432", autocommit=True) as conn:
with conn.cursor() as cur:
...
Expand Down Expand Up @@ -45,13 +46,16 @@ with cur.copy("COPY test.tb1 TO STDOUT") as copy:
print(row)
```

## 2. Importing and Exporting Data in pyarrow Format
## 2. Importing and Exporting Data in [Arrow](https://arrow.apache.org/) Format

`pyarrow` allows efficient data interchange between pandas DataFrames and MyDuck Server. Here is how to import and export data in `pyarrow` format:
The `pyarrow` package allows efficient data interchange between DataFrame libraries and MyDuck Server. Here is how to import and export data in Arrow format:

### Creating a pandas DataFrame and Converting to Arrow Table

```python
import pandas as pd
import pyarrow as pa

data = {
'id': [1, 2, 3],
'num': [100, 200, 300],
Expand All @@ -64,6 +68,8 @@ table = pa.Table.from_pandas(df)
### Writing Data to MyDuck Server in Arrow Format

```python
import io

output_stream = io.BytesIO()
with pa.ipc.RecordBatchStreamWriter(output_stream, table.schema) as writer:
writer.write_table(table)
Expand All @@ -78,37 +84,38 @@ arrow_data = io.BytesIO()
with cur.copy("COPY test.tb1 TO STDOUT (FORMAT arrow)") as copy:
for block in copy:
arrow_data.write(block)
print(arrow_data.getvalue())
```

### Converting Arrow Data to Arrow DataFrame
### Deserializing Arrow Data to Arrow DataFrame

```python
with pa.ipc.open_stream(arrow_data.getvalue()) as reader:
arrow_df = reader.read_all()
print(arrow_df)
```

### Converting Arrow Data to Pandas DataFrame
### Deserializing Arrow Data to pandas DataFrame

```python
with pa.ipc.open_stream(arrow_data.getvalue()) as reader:
pandas_df = reader.read_pandas()
print(pandas_df)
```

## 3. Using polars to Convert pyarrow Format Data
## 3. Using Polars to Process DataFrames

`polars` is a fast DataFrame library that can work with `pyarrow` data. Here is how to use `polars` to convert `pyarrow` format data:
[Polars](https://github.com/pola-rs/polars) is a fast DataFrame library that can work with Arrow data. Here is how to use Polars to read Arrow or pandas dataframes:

### Converting Pandas DataFrame to polars DataFrame
### Converting Arrow DataFrame to Polars DataFrame

```python
polars_df = pl.from_pandas(pandas_df)
import polars as pl

polars_df = pl.from_arrow(arrow_df)
```

### Converting Arrow DataFrame to polars DataFrame
### Converting pandas DataFrame to Polars DataFrame

```python
polars_df = pl.from_arrow(arrow_df)
```
polars_df = pl.from_pandas(pandas_df)
```

0 comments on commit eaafcc7

Please sign in to comment.