Toolset to extract and work with your ChatGPT chat history.
chat_export_to_json.py— streaming extractor that pullsjsonDataandassetsJsonout ofchat.htmlintodata.jsonandassets.json.chatgpt_export.py— library and CLI to list conversations, enumerate messages, and search text.export_test.py— minimal example of programmatic use.
- Python 3.9+ (uses modern typing and dataclasses).
- No third-party dependencies.
- Log into ChatGPT.
- Click your profile picture (bottom-left on desktop).
- Settings → Data controls.
- Click Export next to Export data.
- Click Confirm export.
- Wait for the email, then download the ZIP.
Inside the ZIP you will find a chat.html that contains your full message graph and any asset pointers.
Copy chat.html into the cloned repository folder, then run:
python chat_export_to_json.py chat.htmlBy default this writes data.json and assets.json next to chat.html.
Options:
python chat_export_to_json.py /path/to/chat.html --data-out /somewhere/data.json --assets-out /somewhere/assets.json --overwrite --chunk-size 1048576 --trailing-buffer 8192Notes:
- The extractor is streaming and safe for very large one-line JSON values (hundreds of MiB). It never loads the entire line into memory.
- It trims everything up to and including the
=after the variable name and removes the trailing;(and any trailing spaces or\r). - It recognises
var|let|const, an optionalwindow.prefix, and the variable namesjsonDataandassetsJson. - If the HTML has no final newline the extractor still trims correctly.
- If either variable is not found you will see a message such as
jsonData not found.
The chatgpt_export.py module includes a CLI. Pass data.json (and optionally --assets assets.json) followed by a command.
python chatgpt_export.py data.json --assets assets.json list --limit 10Output format:
[0] <conversation_id> | <title> | created=<UTC> updated=<UTC>
You can address a conversation by id, index, or exact title.
# By index from the list command:
python chatgpt_export.py data.json --assets assets.json show 0
# By id or title:
python chatgpt_export.py data.json --assets assets.json show "abcd-ef01-..."
python chatgpt_export.py data.json --assets assets.json show "How to do X"Modes:
--mode current_path(default) follows the same linear “current path” as the HTML viewer. This walkscurrent_node → parentto the root and shows only text or multimodal text nodes that the HTML would display.--mode chronological_allincludes every mapping node that has message parts, sorted bycreate_timethen id.
# Case-insensitive plain text search
python chatgpt_export.py data.json --assets assets.json search "vector database"
# Regular expression
python chatgpt_export.py data.json --assets assets.json search "(?i)\bvector\b" --regexYou will see matching messages with conversation title, timestamp, author, and a short snippet.
from chatgpt_export import ChatGPTExport
from datetime import datetime, timezone
exp = ChatGPTExport.from_files("data.json", "assets.json")
# List all chats with indices
for idx, info in enumerate(exp.list_conversations()):
print(f"[{idx}] {info.id} | {info.title} | {info.create_time}")
# Enumerate messages on the HTML "current path"
for msg in exp.iter_messages(0, mode="current_path"): # 0 is the index from the list
print(msg.create_time, msg.author_display)
print(msg.text_content())
for part in msg.parts:
if part.kind == "asset":
print("asset:", part.asset.content_type, "->", part.asset.url)
# Search with a time window and author filter
start = datetime(2024, 1, 1, tzinfo=timezone.utc)
end = datetime(2025, 1, 1, tzinfo=timezone.utc)
results = exp.search_messages(
"benchmark",
regex=False,
author_display_in=["ChatGPT"], # filter by display name
created_between=(start, end),
mode="current_path",
)
print(f"Matches: {len(results)}")- Timestamps are parsed from either UNIX seconds or ISO 8601 and exposed as timezone-aware UTC
datetimeinstances. Missing values areNone. - Authors:
"assistant"and"tool"are normalised toChatGPT."system"is hidden unlessmetadata.is_user_system_messageis true, in which case it is shown asCustom user info. This mirrors the export HTML I've looked at. - Assets:
assets.jsonis treated as a mapping from asset pointer to a URL. Missing pointers yieldNone.
- The extractor uses a byte stream with a small trailing buffer (default 8192 bytes) to strip the final semicolon and whitespace. Increase
--trailing-bufferif you encounter pathological amounts of trailing whitespace. - Reading is chunked (default 1 MiB). Adjust
--chunk-sizeto trade throughput for memory. assets.jsonmay be absent in older exports. The tools still work, but asset URLs will beNone.
jsonData not foundorassetsJson not found: Ensure the file is the unmodifiedchat.htmlfrom the ZIP. Some browsers prettify HTML on save which can change formatting. The extractor expects single-line assignments for each variable.Refusing to overwrite: Pass--overwriteif you want to replace existing outputs.Conversation not foundwhen using the CLIshowcommand: If you pass a title, it must match exactly. Try using the index or id from thelistoutput.
- If you are using ChatGPT in a work context, check with your IT team/manager before exporting your ChatGPT data.
- Your data stays on your machine. No network calls are made. However, you need to download the export itself from ChatGPT's servers.
- The export can be very large. Store it on an encrypted disk if you have sensitive content.
This toolset is licensed under the Apache 2.0 open source licence.
