Skip to content

FileFusion is a powerful file concatenation tool designed specifically for Large Language Model (LLM)

License

Notifications You must be signed in to change notification settings

drgsn/filefusion

Repository files navigation

FileFusion πŸš€

Test Coverage Release Go Report Card License: MPL 2.0

FileFusion is a powerful command-line tool designed to concatenate and process files in a format optimized for Large Language Models (LLMs).

Installation β€’ Quick Start β€’ Features β€’ Documentation β€’ Examples

πŸ“š Table of Contents

✨ Features

FileFusion streamlines your file processing workflow with:

Core Features

  • πŸ“¦ Multiple Output Formats

    • Support for XML, JSON, and YAML
    • Preserved file metadata and structure
    • Configurable output formatting
  • 🎯 Smart Pattern Matching

  • ⚑️ High Performance

    • Concurrent file processing
    • Efficient memory usage
    • Automatic file splitting for large outputs

Processing Features

  • πŸ“Š Advanced Size Control

    • Individual file size limits
    • Total output size management
    • Automatic output splitting
    • Detailed size reporting
  • 🧹 Intelligent Code Cleaning

    • Multi-language support
    • Comment preservation options
    • Code structure optimization
    • Whitespace management
  • πŸ”’ Reliability & Safety

    • Atomic write operations
    • Thorough error checking
    • Dry run support
    • Symlink handling

πŸš€ Quick Start

Get started with FileFusion in three simple steps:

  1. Install:

    curl -fsSL https://raw.githubusercontent.com/drgsn/filefusion/main/install.sh | bash
  2. Process current directory:

    filefusion
  3. Process specific files:

    filefusion --pattern "*.{js,py}" --clean -o output.xml /path/to/project

πŸš€ Installation

Quick Install (Recommended)

Using curl:

curl -fsSL https://raw.githubusercontent.com/drgsn/filefusion/main/install.sh | bash

Using wget:

wget -qO- https://raw.githubusercontent.com/drgsn/filefusion/main/install.sh | bash

Safe Install (Recommended Security Practice)

# Download and inspect the script first
curl -fsSL https://raw.githubusercontent.com/drgsn/filefusion/main/install.sh > install.sh
chmod +x install.sh
./install.sh

Alternative Methods

Using Go:

go install github.com/drgsn/filefusion/cmd/filefusion@latest

Or download the latest binary for your platform from the releases page.

πŸ—‘οΈ Uninstallation

To uninstall FileFusion:

  1. Remove the installation directory:
rm -rf ~/.filefusion
  1. Remove FileFusion from your shell configuration file. Depending on your shell and OS, edit one of these files:
  • macOS Bash users: ~/.bash_profile
  • Linux Bash users: ~/.bashrc
  • Zsh users: ~/.zshrc
  • Fish users: ~/.config/fish/config.fish
  • Windows PowerShell users: $HOME/Documents/PowerShell/Microsoft.PowerShell_profile.ps1

Look for and remove these two lines:

# FileFusion
export PATH="$PATH:$HOME/.filefusion"

Go Installation

If you installed using Go:

go clean -i github.com/drgsn/filefusion/cmd/filefusion

πŸ“‹ Configuration

Default Values

Setting Default Value Description
Pattern *.go,*.json,*.yaml,*.yml Default file patterns to process
Max File Size 10MB Maximum size for individual input files
Max Output Size 50MB Maximum total size for all processed content
Max Output File 30KB Maximum size per output file (auto-splits)
Output Format XML Default output format when not specified
Exclude Pattern none No files excluded by default
Clean Mode disabled Code cleaning and optimization
Dry Run disabled Preview files to be processed

🎯 Basic Usage

Simple Commands

# Process current directory with defaults
filefusion

# Process specific directory
filefusion /path/to/project

# Process multiple directories
filefusion /path/to/project1 /path/to/project2

# Generate specific output format
filefusion -o output.json /path/to/project

πŸ› οΈ Flag Examples

Output Path (-o, --output)

# Generate XML output
filefusion -o output.xml /path/to/project

# Generate JSON output
filefusion -o output.json /path/to/project

# Generate YAML output
filefusion -o output.yaml /path/to/project

Pattern Matching Rules

For detailed pattern matching examples and rules, please refer to our Pattern Guide.

Here are some common patterns:

Pattern Description
*.go All Go files
*.{go,proto} All Go and Proto files
src/**/*.js All JavaScript files under src
!vendor/** Exclude vendor directory
**/*_test.go All Go test files

File Patterns (-p, --pattern)

# Process only Python and JavaScript files
filefusion --pattern "*.py,*.js" /path/to/project

# Process all source files
filefusion -p "*.go,*.rs,*.js,*.py,*.java" /path/to/project

# Include configuration files
filefusion -p "*.yaml,*.json,*.toml,*.ini" /path/to/project

Exclusions (-e, --exclude)

# Exclude test files
filefusion --exclude "*_test.go,test/**" /path/to/project

# Exclude build and vendor directories
filefusion -e "build/**,vendor/**,node_modules/**" /path/to/project

# Complex exclusion
filefusion -e "**/*.test.js,**/*tests*/**,**/dist/**" /path/to/project

Size Limits

# Increase individual file size limit to 20MB
filefusion --max-file-size 20MB /path/to/project

# Increase total output size limit to 100MB
filefusion --max-output-size 100MB /path/to/project

# Set maximum size per output file to 50KB (splits into multiple files if exceeded)
filefusion --max-output-file-size 50KB /path/to/project

# Set all size limits and enable cleaning
filefusion --max-file-size 20MB --max-output-size 100MB --max-output-file-size 50KB --clean /path/to/project

Size limits accept suffixes: B, KB, MB, GB, TB

When the processed content exceeds max-output-file-size, FileFusion automatically splits the output into multiple files with sequential numbering (e.g., output.1.xml, output.2.xml, output.3.xml).

πŸ“š Code Cleaning

FileFusion includes a powerful code cleaning engine that optimizes files for LLM processing while preserving functionality. The cleaner supports multiple programming languages and offers various optimization options.

Supported Languages

  • Go, Java, Python, Swift, Kotlin
  • JavaScript, TypeScript, HTML, CSS
  • C++, C#, PHP, Ruby
  • SQL, Bash

Cleaning Options

Option Description Default
--clean Enable code cleaning false
--clean-remove-comments Remove all comments true
--clean-preserve-doc-comments Keep documentation comments true
--clean-remove-imports Remove import statements false
--clean-remove-logging Remove logging statements true
--clean-remove-getters-setters Remove getter/setter methods true
--clean-optimize-whitespace Optimize whitespace true

Cleaning Examples

# Basic cleaning with default options
filefusion --clean input.go -o clean.xml

# Preserve all comments
filefusion --clean --clean-remove-comments=false input.py -o clean.xml

# Remove everything except essential code
filefusion --clean \
  --clean-remove-comments \
  --clean-preserve-doc-comments=false \
  --clean-remove-logging \
  --clean-remove-getters-setters \
  input.java -o clean.xml

# Clean TypeScript while preserving docs
filefusion --clean \
  --clean-preserve-doc-comments \
  --clean-remove-logging \
  --pattern "*.ts" \
  src/ -o clean.xml

# Clean multiple languages in a project
filefusion --clean \
  --pattern "*.{go,js,py}" \
  --clean-preserve-doc-comments \
  --clean-remove-logging \
  project/ -o clean.xml

Language-Specific Features

The cleaner automatically detects and handles language-specific patterns:

  • Logging Statements: Recognizes common logging patterns

    • Go: log., logger.
    • JavaScript/TypeScript: console., logger.
    • Python: logging., logger., print
    • Java: Logger., System.out., System.err.
    • And more...
  • Documentation: Preserves language-specific doc formats

    • Go: // and /* */ doc comments
    • Python: Docstrings
    • JavaScript/TypeScript: JSDoc
    • Java: Javadoc
  • Code Structure: Maintains language idioms while removing noise

    • Preserves package/module structure
    • Keeps essential imports
    • Removes debug/test code

πŸ“š Advanced Examples

Processing a Go Project

filefusion \
  --pattern "*.go" \
  --exclude "*_test.go,vendor/**" \
  --output project.json \
  --max-file-size 5MB \
  /path/to/go/project

Processing Web Project Files

filefusion \
  --pattern "*.js,*.ts,*.jsx,*.tsx,*.css,*.html" \
  --exclude "node_modules/**,dist/**,build/**" \
  --output web-project.xml \
  /path/to/web/project

Code Cleaning and Size Optimization

# Clean and optimize a Go project
filefusion \
  --pattern "*.go" \
  --exclude "*_test.go" \
  --clean \
  --clean-remove-comments \
  --clean-remove-logging \
  --output optimized.xml \
  /path/to/go/project

# Clean TypeScript/JavaScript with preserved documentation
filefusion \
  --pattern "*.ts,*.js" \
  --clean \
  --clean-preserve-doc-comments \
  --clean-remove-logging \
  --clean-optimize-whitespace \
  --output web-optimized.xml \
  /path/to/web/project

πŸ“„ Output Format Examples

XML Output

<?xml version="1.0" encoding="UTF-8"?>
<documents>
  <document index="1">
    <source>main.go</source>
    <document_content>
      package main
      ...
    </document_content>
  </document>
</documents>

JSON Output

{
    "documents": [
        {
            "index": 1,
            "source": "main.go",
            "document_content": "package main\n..."
        }
    ]
}

YAML Output

documents:
    - index: 1
      source: main.go
      document_content: |
          package main
          ...

πŸ’‘ Tips and Best Practices

  1. Start with Dry Run

    filefusion --dry-run /path/to/project

    This shows which files will be processed without making changes.

  2. Optimize for Large Projects

    filefusion --max-output-file-size 1MB --clean /path/to/project

    Use larger output file sizes and cleaning for better LLM processing.

  3. Handle Large Codebases

    filefusion --pattern "*.{go,js}" --exclude "test/**,vendor/**" /path/to/project

    Use specific patterns and exclusions to focus on relevant files.

❗ Issues and Solutions

"no files found matching pattern"

  • Check if patterns match your file extensions
  • Verify files exist in the specified directory
  • Make sure patterns don't conflict with exclusions

"output size exceeds maximum"

  • Increase --max-output-size
  • Use more specific patterns
  • Split processing into multiple runs

"error processing files"

  • Check file permissions
  • Verify file encodings (UTF-8 recommended)
  • Ensure sufficient disk space

πŸ“œ License

Mozilla Public License Version 2.0


Made with ❀️ by the DrGos