Skip to content
/ Rafka Public

Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.

License

Notifications You must be signed in to change notification settings

Mahir101/Rafka

Rafka

A High-Performance Distributed Message Broker Built in Rust

Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.

πŸš€ Key Features

  • High-Performance Async Architecture: Built on Tokio for maximum concurrency and throughput
  • gRPC Communication: Modern protocol buffers for efficient inter-service communication
  • Partitioned Message Processing: Hash-based partitioning for horizontal scalability
  • Disk-based Persistence: Write-Ahead Log (WAL) for message durability
  • Consumer Groups: Load-balanced message consumption with partition assignment
  • Replication: Multi-replica partitions with ISR tracking for high availability
  • Log Compaction: Multiple strategies (KeepLatest, TimeWindow, Hybrid) for storage optimization
  • Transactions: Two-Phase Commit (2PC) with idempotent producer support
  • Comprehensive Monitoring: Health checks, heartbeat tracking, and circuit breakers
  • Real-time Metrics: Prometheus-compatible metrics export with latency histograms
  • Stream Processing: Kafka Streams-like API for message transformation and aggregation
  • Offset Tracking: Consumer offset management for reliable message delivery
  • Retention Policies: Configurable message retention based on age and size
  • Modular Design: Clean separation of concerns across multiple crates

πŸ†š Rafka vs Apache Kafka Feature Comparison

Feature Apache Kafka Rafka (Current) Status
Storage Disk-based (Persistent) Disk-based WAL (Persistent) βœ… Implemented
Architecture Leader/Follower (Zookeeper/KRaft) P2P Mesh / Distributed πŸ”„ Different Approach
Consumption Model Consumer Groups (Load Balancing) Consumer Groups + Pub/Sub βœ… Implemented
Replication Multi-replica with ISR Multi-replica with ISR βœ… Implemented
Message Safety WAL (Write Ahead Log) WAL (Write Ahead Log) βœ… Implemented
Transactions Exactly-once semantics 2PC with Idempotent Producers βœ… Implemented
Compaction Log Compaction Log Compaction (Multiple Strategies) βœ… Implemented
Ecosystem Connect, Streams, Schema Registry Core Broker only ❌ Missing

πŸ” Feature Implementation Status

βœ… Implemented Features

  1. Disk-based Persistence (WAL): Rafka now implements a Write-Ahead Log (WAL) for message durability. Messages are persisted to disk and survive broker restarts.
  2. Consumer Groups: Rafka supports consumer groups with load balancing. Multiple consumers can share the load of a topic, with each partition being consumed by only one member of the group. Both Range and RoundRobin partition assignment strategies are supported.
  3. Replication & High Availability: Rafka implements multi-replica partitions with In-Sync Replica (ISR) tracking and leader election for high availability.
  4. Log Compaction: Rafka supports log compaction with multiple strategies (KeepLatest, TimeWindow, Hybrid) to optimize storage by keeping only the latest value for a key.
  5. Transactions: Rafka implements atomic writes across multiple partitions/topics using Two-Phase Commit (2PC) protocol with idempotent producer support.

❌ Missing Features

  1. Ecosystem Tools: Unlike Apache Kafka, Rafka currently lacks ecosystem tools like Kafka Connect (for data integration), Kafka Streams (for stream processing), and Schema Registry (for schema management). These would need to be developed separately to provide a complete data streaming platform.

πŸ—οΈ Architecture Overview

System Architecture Diagram

graph TB
    subgraph "Client Layer"
        P[Producer]
        C[Consumer]
    end
    
    subgraph "Broker Cluster"
        B1[Broker 1<br/>Partition 0]
        B2[Broker 2<br/>Partition 1]
        B3[Broker 3<br/>Partition 2]
    end
    
    subgraph "Storage Layer"
        S1[In-Memory DB<br/>Partition 0]
        S2[In-Memory DB<br/>Partition 1]
        S3[In-Memory DB<br/>Partition 2]
    end
    
    P -->|gRPC Publish| B1
    P -->|gRPC Publish| B2
    P -->|gRPC Publish| B3
    
    B1 -->|Store Messages| S1
    B2 -->|Store Messages| S2
    B3 -->|Store Messages| S3
    
    C -->|gRPC Consume| B1
    C -->|gRPC Consume| B2
    C -->|gRPC Consume| B3
    
    B1 -->|Broadcast Stream| C
    B2 -->|Broadcast Stream| C
    B3 -->|Broadcast Stream| C
Loading

Message Flow Sequence Diagram

sequenceDiagram
    participant P as Producer
    participant B as Broker
    participant S as Storage
    participant C as Consumer
    
    P->>B: PublishRequest(topic, key, payload)
    B->>B: Hash key for partition
    B->>B: Check partition ownership
    B->>S: Store message with offset
    S-->>B: Return offset
    B->>B: Broadcast to subscribers
    B-->>P: PublishResponse(message_id, offset)
    
    C->>B: ConsumeRequest(topic)
    B->>B: Create broadcast stream
    B-->>C: ConsumeResponse stream
    
    loop Message Processing
        B->>C: ConsumeResponse(message)
        C->>B: AcknowledgeRequest(message_id)
        C->>B: UpdateOffsetRequest(offset)
    end
Loading

πŸ“ Project Structure

rafka/
β”œβ”€β”€ Cargo.toml                 # Workspace manifest
β”œβ”€β”€ config/
β”‚   └── config.yml            # Configuration file
β”œβ”€β”€ scripts/                  # Demo and utility scripts
β”‚   β”œβ”€β”€ helloworld.sh         # Basic producer-consumer demo
β”‚   β”œβ”€β”€ partitioned_demo.sh   # Multi-broker partitioning demo
β”‚   β”œβ”€β”€ retention_demo.sh     # Message retention demo
β”‚   β”œβ”€β”€ offset_tracking_demo.sh # Consumer offset tracking demo
β”‚   └── kill.sh               # Process cleanup script
β”œβ”€β”€ src/
β”‚   └── bin/                  # Executable binaries
β”‚       β”œβ”€β”€ start_broker.rs   # Broker server
β”‚       β”œβ”€β”€ start_producer.rs # Producer client
β”‚       β”œβ”€β”€ start_consumer.rs # Consumer client
β”‚       └── check_metrics.rs  # Metrics monitoring
β”œβ”€β”€ crates/                   # Core library crates
β”‚   β”œβ”€β”€ core/                 # Core types and gRPC definitions
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ lib.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ message.rs    # Message structures
β”‚   β”‚   β”‚   └── proto/
β”‚   β”‚   β”‚       └── rafka.proto # gRPC service definitions
β”‚   β”‚   └── build.rs          # Protocol buffer compilation
β”‚   β”œβ”€β”€ broker/               # Broker implementation
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs
β”‚   β”‚       └── broker.rs     # Core broker logic
β”‚   β”œβ”€β”€ producer/             # Producer implementation
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs
β”‚   β”‚       └── producer.rs   # Producer client
β”‚   β”œβ”€β”€ consumer/             # Consumer implementation
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs
β”‚   β”‚       └── consumer.rs   # Consumer client
β”‚   └── storage/              # Storage engine
β”‚       └── src/
β”‚           β”œβ”€β”€ lib.rs
β”‚           └── db.rs         # In-memory database
β”œβ”€β”€ docs/
β”‚   └── getting_started.md    # Getting started guide
β”œβ”€β”€ tasks/
β”‚   └── Roadmap.md           # Development roadmap
β”œβ”€β”€ Dockerfile               # Container configuration
└── LICENSE                  # MIT License

πŸš€ Quick Start

Prerequisites

  • Rust: Latest stable version (1.70+)
  • Cargo: Comes with Rust installation
  • Protocol Buffers: For gRPC compilation

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/rafka.git
cd rafka
  1. Build the project:
cargo build --release
  1. Run the basic demo:
./scripts/helloworld.sh

Manual Setup

  1. Start a broker:
cargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3
  1. Start a consumer:
cargo run --bin start_consumer -- --port 50051
  1. Send messages:
cargo run --bin start_producer -- --message "Hello, Rafka!" --key "test-key"

πŸ”§ Configuration

Broker Configuration

The broker can be configured via command-line arguments:

cargo run --bin start_broker -- \
  --port 50051 \
  --partition 0 \
  --total-partitions 3 \
  --retention-seconds 604800

Available Options:

  • --port: Broker listening port (default: 50051)
  • --partition: Partition ID for this broker (default: 0)
  • --total-partitions: Total number of partitions (default: 1)
  • --retention-seconds: Message retention time in seconds (default: 7 days)

Configuration File

Edit config/config.yml for persistent settings:

server:
  host: "127.0.0.1"
  port: 9092

log:
  level: "info"  # debug, info, warn, error

broker:
  replication_factor: 3
  default_topic_partitions: 1

storage:
  type: "in_memory"

πŸ›οΈ Core Components

1. Core (rafka-core)

Purpose: Defines fundamental types and gRPC service contracts.

Key Components:

  • Message Structures: Message, MessageAck, BenchmarkMetrics
  • gRPC Definitions: Protocol buffer definitions for all services
  • Serialization: Serde-based serialization for message handling

Key Files:

  • message.rs: Core message types and acknowledgment structures
  • proto/rafka.proto: gRPC service definitions

2. Broker (rafka-broker)

Purpose: Central message routing and coordination service.

Key Features:

  • Partition Management: Hash-based message partitioning
  • Topic Management: Dynamic topic creation and subscription
  • Broadcast Channels: Efficient message distribution to consumers
  • Offset Tracking: Consumer offset management
  • Retention Policies: Configurable message retention
  • Metrics Collection: Real-time performance metrics

Key Operations:

  • publish(): Accept messages from producers
  • consume(): Stream messages to consumers
  • subscribe(): Register consumer subscriptions
  • acknowledge(): Process message acknowledgments
  • update_offset(): Track consumer progress

3. Producer (rafka-producer)

Purpose: Client library for publishing messages to brokers.

Key Features:

  • Connection Management: Automatic broker connection handling
  • Message Publishing: Reliable message delivery with acknowledgments
  • Error Handling: Comprehensive error reporting
  • UUID Generation: Unique message identification

Usage Example:

let mut producer = Producer::new("127.0.0.1:50051").await?;
producer.publish("my-topic".to_string(), "Hello World".to_string(), "key-1".to_string()).await?;

4. Consumer (rafka-consumer)

Purpose: Client library for consuming messages from brokers.

Key Features:

  • Subscription Management: Topic subscription handling
  • Stream Processing: Asynchronous message streaming
  • Automatic Acknowledgment: Built-in message acknowledgment
  • Offset Tracking: Automatic offset updates
  • Channel-based API: Clean async/await interface

Usage Example:

let mut consumer = Consumer::new("127.0.0.1:50051").await?;
consumer.subscribe("my-topic".to_string()).await?;
let mut rx = consumer.consume("my-topic".to_string()).await?;
while let Some(message) = rx.recv().await {
    println!("Received: {}", message);
}

5. Storage (rafka-storage)

Purpose: High-performance in-memory storage engine.

Key Features:

  • Partition-based Storage: Separate queues per partition
  • Retention Policies: Age and size-based message retention
  • Offset Management: Efficient offset tracking and retrieval
  • Acknowledgment Tracking: Consumer acknowledgment management
  • Metrics Collection: Storage performance metrics
  • Memory Optimization: Efficient memory usage with cleanup

Storage Architecture:

graph LR
    subgraph "Storage Engine"
        T[Topic]
        P1[Partition 0]
        P2[Partition 1]
        P3[Partition 2]
        
        T --> P1
        T --> P2
        T --> P3
        
        P1 --> Q1[Message Queue]
        P2 --> Q2[Message Queue]
        P3 --> Q3[Message Queue]
    end
Loading

πŸ”„ Message Flow

Publishing Flow

  1. Producer sends PublishRequest to Broker
  2. Broker hashes the message key to determine partition
  3. Broker checks partition ownership
  4. Broker stores message in Storage with unique offset
  5. Broker broadcasts message to subscribed consumers
  6. Broker returns PublishResponse with message ID and offset

Consumption Flow

  1. Consumer sends ConsumeRequest to Broker
  2. Broker creates broadcast stream for the topic
  3. Broker streams messages via gRPC to Consumer
  4. Consumer processes message and sends acknowledgment
  5. Consumer updates offset to track progress
  6. Storage cleans up acknowledged messages based on retention policy

πŸ“Š Performance Features

Partitioning Strategy

Rafka uses hash-based partitioning for efficient message distribution:

fn hash_key(&self, key: &str) -> u32 {
    key.bytes().fold(0u32, |acc, b| acc.wrapping_add(b as u32))
}

fn owns_partition(&self, message_key: &str) -> bool {
    let hash = self.hash_key(message_key);
    hash % self.total_partitions == self.partition_id
}

Retention Policies

Configurable message retention based on:

  • Time-based: Maximum age (default: 7 days)
  • Size-based: Maximum storage size (default: 1GB)

Metrics Collection

Built-in metrics for monitoring:

  • Total messages stored
  • Total bytes consumed
  • Oldest message age
  • Consumer offset positions

πŸ§ͺ Demo Scripts

1. Hello World Demo

./scripts/helloworld.sh

Basic producer-consumer interaction demonstration.

2. Partitioned Demo

./scripts/partitioned_demo.sh

Multi-broker setup with hash-based partitioning.

3. Retention Demo

./scripts/retention_demo.sh

Demonstrates message retention policies.

4. Offset Tracking Demo

./scripts/offset_tracking_demo.sh

Shows consumer offset management and recovery.

πŸ› οΈ Development

Building from Source

# Clone repository
git clone https://github.com/yourusername/rafka.git
cd rafka

# Build all crates
cargo build

# Run tests
cargo test

# Build release version
cargo build --release

Running Tests

# Run all tests
cargo test

# Run specific crate tests
cargo test -p rafka-storage
cargo test -p rafka-broker

Code Structure

The project follows Rust best practices with:

  • Workspace Organization: Multiple crates in a single workspace
  • Separation of Concerns: Each component in its own crate
  • Async/Await: Modern async Rust with Tokio
  • Error Handling: Comprehensive error types and handling
  • Testing: Unit tests for all major components

🚧 Current Status

⚠️ Early Development - Not Production Ready

Rafka is currently in active development. The current implementation provides:

βœ… Completed Features:

  • Basic message publishing and consumption
  • Hash-based partitioning
  • In-memory storage with retention policies
  • Consumer offset tracking
  • gRPC-based communication
  • Metrics collection
  • Demo scripts and examples

πŸ”„ In Progress:

  • Peer-to-peer mesh networking
  • Distributed consensus algorithms
  • Kubernetes deployment configurations
  • Performance optimizations

πŸ“‹ Planned Features:

  • Replication across multiple brokers
  • Fault tolerance and recovery
  • Security and authentication
  • Client SDKs for multiple languages
  • Comprehensive monitoring and alerting

🀝 Contributing

We welcome contributions! Here are some areas where you can help:

High Priority

  • P2P Mesh Implementation: Distributed node discovery and communication
  • Consensus Algorithms: Leader election and cluster coordination
  • Replication: Cross-broker message replication
  • Fault Tolerance: Node failure detection and recovery

Medium Priority

  • Performance Optimization: Message batching and compression
  • Security: TLS encryption and authentication
  • Monitoring: Prometheus metrics and Grafana dashboards
  • Documentation: API documentation and tutorials

Getting Started

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Apache Kafka for inspiration on messaging systems
  • Tokio for the excellent async runtime
  • Tonic for gRPC implementation
  • @wyattgill9 for the initial proof of concept
  • The Rust community for their excellent libraries and support

Built with ❀️ in Rust

About

Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •