Transform Twitter archives into organized conversation intelligence
Tweet-Scrolls processes your Twitter archive files and generates structured conversation threads and timeline analysis. Like the Marauder's Map, it reveals organized patterns in your tweet and DM conversations.
tweets.js: Your exported tweetsdirect-messages.js: Your exported direct messagesheaders.js: Archive metadata
threads_user_<id>.csv: Structured tweet threads (size varies by user)dm_threads_user_<id>.csv: Structured DM threads with relative timestampstimeline_analysis_user_<id>.csv: Timeline and activity analysis- TXT files over 1MB are automatically split into chunks for easier upload to LLMs
./target/release/tweet-scrolls /home/amuldotexe/Desktop/GitHub202410/tweet-scrolls/REALDATARequired files in the archive folder:
tweets.jsdirect-messages.jsheaders.js
This command processes your Twitter archive and generates all output files in the appropriate output folders.
After processing, you will find these main files in each output folder:
threads_*.csv: Tweet conversations with metadata
dm_threads_*.csv: DM conversations with timing
timeline_analysis_*.csv: Activity patterns and statistics
results_*.txt: Processing summary and statistics
threads_*.txt: Human-readable tweet threads
dm_threads_*.txt: Human-readable DM threads
timeline_analysis_*.txt: Activity insights and summaries
Output TXT files over 1MB are automatically split into chunks for easier upload to LLMs
DM thread text and data outputs must include relative timestamps for each message, showing how many minutes, hours, or days have passed since the previous message in the thread. This provides context for the pacing and timing of conversations, making the output more informative and useful for analysis.
Example:
1754755789: Hello! [at 2025-08-09 10:00]
1234567890: Hi there! (5 minutes later) [at 2025-08-09 10:05]
1754755789: How are you? (2 hours later) [at 2025-08-09 12:05]
flowchart TD
subgraph input ["π₯ What You Provide"]
A[π Twitter Archive]
A1[π tweets.js]
A2[π¬ direct-messages.js]
A3[π headers.js]
end
input --> process
subgraph process ["β‘ Tweet-Scrolls"]
P[π Process & Analyze]
end
process --> output
subgraph output ["π€ What You Get"]
B[π Structured Data]
B1[π Human Readable]
B2[π Timeline Analysis]
end
output --> details
subgraph details ["π File Details"]
B3[threads_*.csv<br/>dm_threads_*.csv<br/>timeline_analysis_*.csv]
B4[threads_*.txt<br/>dm_threads_*.txt<br/>timeline_analysis_*.txt]
B5[results_*.txt<br/>dm_results_*.txt]
end
style input fill:#e8f4fd
style process fill:#fff8e1
style output fill:#f1f8e9
style details fill:#fdf2f8
- Thread Reconstruction: Connects all replies into complete conversations
- DM Organization: Converts message threads into readable conversation flows
- Timeline Analysis: Shows when you're most active and interaction patterns
- Multi-Format Output: Generates both CSV data files and human-readable text
- Privacy Protection: All processing happens locally, user IDs are anonymized
- Rust 1.70+ (install here)
- Your Twitter archive (download from Twitter/X settings)
git clone https://github.com/that-in-rust/tweet-scrolls.git
cd tweet-scrolls
cargo build --release
# Process your archive
./target/release/tweet-scrolls /path/to/your/twitter/archive# Basic usage (recommended)
./target/release/tweet-scrolls /path/to/archive
./target/release/tweet-scrolls /home/amuldotexe/Desktop/GitHub202410/tweet-scrolls/REALDATA
# Custom output location
./target/release/tweet-scrolls /path/to/archive /path/to/output
# Interactive mode
./target/release/tweet-scrollsflowchart TD
A1["π Discovery<br/>π Auto-detect files<br/>π Setup directories"]
A2["π§΅ Thread Building<br/>π¬ Connect replies<br/>π Build conversations"]
A3["π¬ DM Organization<br/>β° Add timestamps<br/>π₯ User IDs"]
A4["π Anonymization<br/>π Blake3 hashing<br/>π‘οΈ Protect identity"]
A5["π Data Generation<br/>π CSV files<br/>π Human-readable"]
A6["π Final Output<br/>π Timeline analysis<br/>β
Processing complete"]
A1 --> A2
A2 --> A3
A3 --> A4
A4 --> A5
A5 --> A6
style A1 fill:#e8f5e8
style A2 fill:#e8f5e8
style A3 fill:#fff3e0
style A4 fill:#fff3e0
style A5 fill:#f3e5f5
style A6 fill:#f3e5f5
The Magic: Like a digital archaeologist, Tweet-Scrolls discovers your Twitter archive files, intelligently reconstructs conversation threads, and transforms them into organized, readable formats - all while keeping your data safe and local.
Like transforming scattered pages into a coherent storybook, Tweet-Scrolls compiles individual JSON messages into cohesive conversation threads that are easy to read and analyze.
flowchart TD
subgraph Input ["π Raw JSON Messages"]
A1["msg1: 'Hello!'<br/>sender: A, id: 1"]
A2["msg2: 'Hi there!'<br/>sender: B, id: 2"]
A3["msg3: 'How are you?'<br/>sender: A, id: 3"]
end
Input --> Processing
subgraph Processing ["π§ Transformation Engine"]
B1["π Parse Content<br/>Extract text & metadata"]
B2["β° Add Timestamps<br/>Calculate relative timing"]
B3["π§΅ Thread Assembly<br/>Order chronologically"]
B4["π Anonymization<br/>Hash user identifiers"]
end
Processing --> Output
subgraph Output ["π¬ Organized Thread"]
C1["User 123: Hello!<br/>(5 minutes later)<br/>User 456: Hi there!<br/>(5 minutes later)<br/>User 123: How are you?"]
end
Output --> Metadata
subgraph Metadata ["π Metadata"]
C2["β’ 3 messages<br/>β’ 10 min duration<br/>β’ Participants (by user ID)<br/>β’ Blake3 anonymized"]
end
style Input fill:#ffe0e0
style Processing fill:#fff3e0
style Output fill:#e8f5e8
style Metadata fill:#f0f9ff
The Transformation: Individual JSON objects become natural conversation flow with timing context and participant anonymization - perfect for review and analysis.
| File | Content | Purpose |
|---|---|---|
threads_*.csv |
Tweet conversations with metadata | Data analysis |
threads_*.txt |
Human-readable tweet threads | Review conversations |
dm_threads_*.csv |
DM conversations with timing | Data analysis |
dm_threads_*.txt |
Human-readable DM threads | Review private messages |
timeline_analysis_*.csv |
Activity patterns and statistics | Behavioral analysis |
timeline_analysis_*.txt |
Activity insights and summaries | Understanding patterns |
results_*.txt |
Processing summary and statistics | Overview |
All processing happens locally - your data never leaves your machine.
By default, DM thread text outputs display actual user IDs (e.g., "User 1234567890:") for clarity and traceability. Label-based output (A/B) is not enabled by default.
- Local processing only (no network connections)
- Automatic git protection for private data
- Comprehensive .gitignore protection
# Safety check before commits
./check_data_safety.sh- Processes 50,000+ tweets efficiently
- Handles large DM archives with streaming
- Parallel processing for optimal speed
- Memory-efficient design
# Run tests
cargo test
# Check code quality
cargo clippymodels/- Data structures for tweets, DMs, and analysisprocessing/- JSON parsing and data transformationrelationship/- Intelligence extraction and report generationservices/- Timeline analysis and pattern detection
Split large archive files into manageable chunks, and automatically split output TXT files over 1MB after main processing:
cargo build --release --bin file-splitter
./target/release/file-splitter large_archive.js
# Custom options
./target/release/file-splitter -i tweets.js -s 5M -o chunks/
# Automatic post-processing (new requirement)
# After main processing, Tweet-Scrolls will automatically scan output folders and apply file-splitter to any output TXT files over 1MB, splitting them into manageable chunks for easier review and sharing.MIT License
Like the Marauder's Map, Tweet-Scrolls reveals the hidden patterns in your digital world.
graph TD
subgraph CLI ["π₯οΈ CLI Layer"]
A1["main.rs<br/>Entry point<br/>User interaction"]
A2["cli.rs<br/>Command line interface<br/>Argument parsing<br/>Interactive mode"]
end
subgraph Processing ["βοΈ Processing Layer"]
B1["data_structures.rs<br/>Core data structures"]
B2["file_io.rs<br/>File input/output"]
B3["tweets.rs<br/>Tweet parsing"]
B4["direct_messages.rs<br/>DM parsing"]
B5["reply_threads.rs<br/>Thread reconstruction"]
B6["dm_threads.rs<br/>DM threading"]
end
subgraph Analysis ["π Analysis Layer"]
C1["analyzer.rs<br/>Core analysis engine"]
C2["timeline_analyzer.rs<br/>Timeline patterns"]
C3["relationship/analyzer.rs<br/>Relationship intelligence"]
C4["anonymization.rs<br/>Privacy protection"]
end
subgraph Output ["π€ Output Layer"]
D1["file_generation.rs<br/>File orchestration"]
D2["text_generators.rs<br/>Human-readable text"]
D3["prompts_generator.rs<br/>LLM analysis prompts"]
D4["enhanced_csv_writer.rs<br/>CSV output"]
end
subgraph Models ["π¦ Data Models"]
E1["direct_message.rs<br/>DM structures"]
E2["profile.rs<br/>User profiles"]
E3["statistics.rs<br/>Statistical data"]
E4["timeline.rs<br/>Timeline structures"]
end
CLI --> Processing
Processing --> Analysis
Analysis --> Output
Models -.-> Processing
Models -.-> Analysis
Models -.-> Output
style CLI fill:#e3f2fd
style Processing fill:#fff3e0
style Analysis fill:#f3e5f5
style Output fill:#e8f5e8
style Models fill:#fce4ec
"Like organizing a messy bookshelf into a beautiful library..."
flowchart TD
subgraph Files ["π Generated Data Files"]
A1["threads_*.csv<br/>Tweet conversations"]
A2["dm_threads_*.csv<br/>DM conversations"]
A3["timeline_analysis_*.csv<br/>Activity patterns"]
A4["*.txt files<br/>Human-readable formats"]
end
Files --> Analysis
subgraph Analysis ["π What You Can Discover"]
B1["π Conversation patterns<br/>β’ Thread lengths<br/>β’ Response frequencies"]
B2["β° Activity insights<br/>β’ Peak hours<br/>β’ Most active days"]
end
Analysis --> Privacy
subgraph Privacy ["π Privacy Protected"]
C1["π Blake3 anonymization<br/>π‘οΈ Local processing<br/>π« No network calls"]
end
style Files fill:#e3f2fd
style Analysis fill:#fff3e0
style Privacy fill:#fdf2f8
The Result: Your digital conversations become organized, structured data that preserves conversation flow and timing while protecting your privacy through local processing and anonymization.
Like the Marauder's Map, Tweet-Scrolls reveals the hidden patterns in your digital world.