Skip to content

Convert your twitter archive into scrolls, where all your multi-tweet threads are stored in a single header in a txt file

Notifications You must be signed in to change notification settings

that-in-rust/tweet-scrolls

Repository files navigation

Tweet-Scrolls πŸ“œ

Transform Twitter archives into organized conversation intelligence

Tweet-Scrolls processes your Twitter archive files and generates structured conversation threads and timeline analysis. Like the Marauder's Map, it reveals organized patterns in your tweet and DM conversations.

Input Files (Required)

  • tweets.js: Your exported tweets
  • direct-messages.js: Your exported direct messages
  • headers.js: Archive metadata

Output Files (Generated)

  • threads_user_<id>.csv: Structured tweet threads (size varies by user)
  • dm_threads_user_<id>.csv: Structured DM threads with relative timestamps
  • timeline_analysis_user_<id>.csv: Timeline and activity analysis
  • TXT files over 1MB are automatically split into chunks for easier upload to LLMs

Quick Start

./target/release/tweet-scrolls /home/amuldotexe/Desktop/GitHub202410/tweet-scrolls/REALDATA

Required files in the archive folder:

  • tweets.js
  • direct-messages.js
  • headers.js

This command processes your Twitter archive and generates all output files in the appropriate output folders.


Key Output Files

After processing, you will find these main files in each output folder: threads_*.csv: Tweet conversations with metadata dm_threads_*.csv: DM conversations with timing timeline_analysis_*.csv: Activity patterns and statistics results_*.txt: Processing summary and statistics threads_*.txt: Human-readable tweet threads dm_threads_*.txt: Human-readable DM threads timeline_analysis_*.txt: Activity insights and summaries

Output TXT files over 1MB are automatically split into chunks for easier upload to LLMs

Relative Timestamps in DM Thread Outputs

DM thread text and data outputs must include relative timestamps for each message, showing how many minutes, hours, or days have passed since the previous message in the thread. This provides context for the pacing and timing of conversations, making the output more informative and useful for analysis.

Example:

1754755789: Hello! [at 2025-08-09 10:00]
1234567890: Hi there! (5 minutes later) [at 2025-08-09 10:05]
1754755789: How are you? (2 hours later) [at 2025-08-09 12:05]

Input β†’ Output

flowchart TD
    subgraph input ["πŸ“₯ What You Provide"]
        A[πŸ“‚ Twitter Archive]
        A1[πŸ“„ tweets.js]
        A2[πŸ’¬ direct-messages.js]
        A3[πŸ“‹ headers.js]
    end
    
    input --> process
    
    subgraph process ["⚑ Tweet-Scrolls"]
        P[πŸ”„ Process & Analyze]
    end
    
    process --> output
    
    subgraph output ["πŸ“€ What You Get"]
        B[πŸ“Š Structured Data]
        B1[πŸ“ Human Readable] 
        B2[πŸ“ˆ Timeline Analysis]
    end
    
    output --> details
    
    subgraph details ["πŸ“‹ File Details"]
        B3[threads_*.csv<br/>dm_threads_*.csv<br/>timeline_analysis_*.csv]
        B4[threads_*.txt<br/>dm_threads_*.txt<br/>timeline_analysis_*.txt]
        B5[results_*.txt<br/>dm_results_*.txt]
    end
    
    style input fill:#e8f4fd
    style process fill:#fff8e1
    style output fill:#f1f8e9
    style details fill:#fdf2f8
Loading

Key Capabilities

  • Thread Reconstruction: Connects all replies into complete conversations
  • DM Organization: Converts message threads into readable conversation flows
  • Timeline Analysis: Shows when you're most active and interaction patterns
  • Multi-Format Output: Generates both CSV data files and human-readable text
  • Privacy Protection: All processing happens locally, user IDs are anonymized

Installation & Usage

Requirements

  • Rust 1.70+ (install here)
  • Your Twitter archive (download from Twitter/X settings)

Quick Start

git clone https://github.com/that-in-rust/tweet-scrolls.git
cd tweet-scrolls
cargo build --release

# Process your archive
./target/release/tweet-scrolls /path/to/your/twitter/archive

Usage Options

# Basic usage (recommended)
./target/release/tweet-scrolls /path/to/archive

./target/release/tweet-scrolls /home/amuldotexe/Desktop/GitHub202410/tweet-scrolls/REALDATA

# Custom output location
./target/release/tweet-scrolls /path/to/archive /path/to/output

# Interactive mode
./target/release/tweet-scrolls

User Journey

πŸ—οΈ How It Works: From Raw Data to Organized Intelligence

flowchart TD
    A1["πŸ” Discovery<br/>πŸ“‚ Auto-detect files<br/>πŸ“ Setup directories"]
    A2["🧡 Thread Building<br/>πŸ’¬ Connect replies<br/>πŸ”— Build conversations"]
    A3["πŸ’¬ DM Organization<br/>⏰ Add timestamps<br/>πŸ‘₯ User IDs"]
    A4["πŸ” Anonymization<br/>πŸ”’ Blake3 hashing<br/>πŸ›‘οΈ Protect identity"]
    A5["πŸ“Š Data Generation<br/>πŸ“ˆ CSV files<br/>πŸ“ Human-readable"]
    A6["πŸ“Š Final Output<br/>πŸ“ˆ Timeline analysis<br/>βœ… Processing complete"]
    
    A1 --> A2
    A2 --> A3
    A3 --> A4
    A4 --> A5
    A5 --> A6
    
    style A1 fill:#e8f5e8
    style A2 fill:#e8f5e8  
    style A3 fill:#fff3e0
    style A4 fill:#fff3e0
    style A5 fill:#f3e5f5
    style A6 fill:#f3e5f5
Loading

The Magic: Like a digital archaeologist, Tweet-Scrolls discovers your Twitter archive files, intelligently reconstructs conversation threads, and transforms them into organized, readable formats - all while keeping your data safe and local.

Thread Compilation Example

Like transforming scattered pages into a coherent storybook, Tweet-Scrolls compiles individual JSON messages into cohesive conversation threads that are easy to read and analyze.

flowchart TD
    subgraph Input ["πŸ“„ Raw JSON Messages"]
        A1["msg1: 'Hello!'<br/>sender: A, id: 1"]
        A2["msg2: 'Hi there!'<br/>sender: B, id: 2"]
        A3["msg3: 'How are you?'<br/>sender: A, id: 3"]
    end
    
    Input --> Processing
    
    subgraph Processing ["🧠 Transformation Engine"]
        B1["πŸ” Parse Content<br/>Extract text & metadata"]
        B2["⏰ Add Timestamps<br/>Calculate relative timing"]
        B3["🧡 Thread Assembly<br/>Order chronologically"]
        B4["πŸ” Anonymization<br/>Hash user identifiers"]
    end
    
    Processing --> Output
    
    subgraph Output ["πŸ’¬ Organized Thread"]
        C1["User 123: Hello!<br/>(5 minutes later)<br/>User 456: Hi there!<br/>(5 minutes later)<br/>User 123: How are you?"]
    end
    
    Output --> Metadata
    
    subgraph Metadata ["πŸ“Š Metadata"]
        C2["β€’ 3 messages<br/>β€’ 10 min duration<br/>β€’ Participants (by user ID)<br/>β€’ Blake3 anonymized"]
    end
    
    style Input fill:#ffe0e0
    style Processing fill:#fff3e0
    style Output fill:#e8f5e8
    style Metadata fill:#f0f9ff
Loading

The Transformation: Individual JSON objects become natural conversation flow with timing context and participant anonymization - perfect for review and analysis.

File Details

File Content Purpose
threads_*.csv Tweet conversations with metadata Data analysis
threads_*.txt Human-readable tweet threads Review conversations
dm_threads_*.csv DM conversations with timing Data analysis
dm_threads_*.txt Human-readable DM threads Review private messages
timeline_analysis_*.csv Activity patterns and statistics Behavioral analysis
timeline_analysis_*.txt Activity insights and summaries Understanding patterns
results_*.txt Processing summary and statistics Overview

Privacy & Security

All processing happens locally - your data never leaves your machine.

DM Thread Output: User IDs (default)

By default, DM thread text outputs display actual user IDs (e.g., "User 1234567890:") for clarity and traceability. Label-based output (A/B) is not enabled by default.

Built-in Safety Features

  • Local processing only (no network connections)
  • Automatic git protection for private data
  • Comprehensive .gitignore protection
# Safety check before commits
./check_data_safety.sh

Performance

  • Processes 50,000+ tweets efficiently
  • Handles large DM archives with streaming
  • Parallel processing for optimal speed
  • Memory-efficient design

Development

# Run tests
cargo test

# Check code quality
cargo clippy

Architecture

  • models/ - Data structures for tweets, DMs, and analysis
  • processing/ - JSON parsing and data transformation
  • relationship/ - Intelligence extraction and report generation
  • services/ - Timeline analysis and pattern detection

File Splitter Utility

Split large archive files into manageable chunks, and automatically split output TXT files over 1MB after main processing:

cargo build --release --bin file-splitter
./target/release/file-splitter large_archive.js

# Custom options
./target/release/file-splitter -i tweets.js -s 5M -o chunks/

# Automatic post-processing (new requirement)
# After main processing, Tweet-Scrolls will automatically scan output folders and apply file-splitter to any output TXT files over 1MB, splitting them into manageable chunks for easier review and sharing.

License

MIT License


Like the Marauder's Map, Tweet-Scrolls reveals the hidden patterns in your digital world.

Architecture

graph TD
    subgraph CLI ["πŸ–₯️ CLI Layer"]
        A1["main.rs<br/>Entry point<br/>User interaction"]
        A2["cli.rs<br/>Command line interface<br/>Argument parsing<br/>Interactive mode"]
    end
    
    subgraph Processing ["βš™οΈ Processing Layer"]
        B1["data_structures.rs<br/>Core data structures"]
        B2["file_io.rs<br/>File input/output"]
        B3["tweets.rs<br/>Tweet parsing"]
        B4["direct_messages.rs<br/>DM parsing"]
        B5["reply_threads.rs<br/>Thread reconstruction"]
        B6["dm_threads.rs<br/>DM threading"]
    end
    
    subgraph Analysis ["πŸ” Analysis Layer"]
        C1["analyzer.rs<br/>Core analysis engine"]
        C2["timeline_analyzer.rs<br/>Timeline patterns"]
        C3["relationship/analyzer.rs<br/>Relationship intelligence"]
        C4["anonymization.rs<br/>Privacy protection"]
    end
    
    subgraph Output ["πŸ“€ Output Layer"]
        D1["file_generation.rs<br/>File orchestration"]
        D2["text_generators.rs<br/>Human-readable text"]
        D3["prompts_generator.rs<br/>LLM analysis prompts"]
        D4["enhanced_csv_writer.rs<br/>CSV output"]
    end
    
    subgraph Models ["πŸ“¦ Data Models"]
        E1["direct_message.rs<br/>DM structures"]
        E2["profile.rs<br/>User profiles"]
        E3["statistics.rs<br/>Statistical data"]
        E4["timeline.rs<br/>Timeline structures"]
    end
    
    CLI --> Processing
    Processing --> Analysis
    Analysis --> Output
    Models -.-> Processing
    Models -.-> Analysis
    Models -.-> Output
    
    style CLI fill:#e3f2fd
    style Processing fill:#fff3e0
    style Analysis fill:#f3e5f5
    style Output fill:#e8f5e8
    style Models fill:#fce4ec
Loading

Output Analysis

"Like organizing a messy bookshelf into a beautiful library..."

flowchart TD
    subgraph Files ["πŸ“Š Generated Data Files"]
        A1["threads_*.csv<br/>Tweet conversations"]
        A2["dm_threads_*.csv<br/>DM conversations"]
        A3["timeline_analysis_*.csv<br/>Activity patterns"]
        A4["*.txt files<br/>Human-readable formats"]
    end
    
    Files --> Analysis
    
    subgraph Analysis ["πŸ“ˆ What You Can Discover"]
        B1["πŸ“Š Conversation patterns<br/>β€’ Thread lengths<br/>β€’ Response frequencies"]
        B2["⏰ Activity insights<br/>β€’ Peak hours<br/>β€’ Most active days"]
    end
    
    Analysis --> Privacy
    
    subgraph Privacy ["πŸ” Privacy Protected"]
        C1["πŸ”’ Blake3 anonymization<br/>πŸ›‘οΈ Local processing<br/>🚫 No network calls"]
    end
    
    style Files fill:#e3f2fd
    style Analysis fill:#fff3e0
    style Privacy fill:#fdf2f8
Loading

The Result: Your digital conversations become organized, structured data that preserves conversation flow and timing while protecting your privacy through local processing and anonymization.


Like the Marauder's Map, Tweet-Scrolls reveals the hidden patterns in your digital world.

About

Convert your twitter archive into scrolls, where all your multi-tweet threads are stored in a single header in a txt file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published