Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Deploy Documentation

on:
push:
branches:
- main
paths:
- 'docs/**'
- 'mkdocs.yml'
workflow_dispatch:

permissions:
contents: write

jobs:
deploy-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'

- name: Install MkDocs
run: pip install -r docs/requirements.txt

- name: Build and deploy documentation
run: mkdocs gh-deploy --force
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ CMakeFiles/*
!build/buildGnome.sh
!build/buildTestMultiClient.sh
!build/buildTestMultiDataNode.sh
!build/buildTestMultiple.sh
!build/buildTestMultiple.sh

# MkDocs generated site (build artifact)
site/
109 changes: 109 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Architecture

Squid Storage is composed of three independent components that communicate over TCP using the [Squid Protocol](protocol.md).

```
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT (GUI) │
│ SquidStorage ←──────────────────────────────→ SquidStorage │
│ (primary socket) (secondary socket) │
└────────────────────────────┬────────────────────────────────────┘
│ Squid Protocol (TCP)
┌────────────────────────────────────────────────────────────────┐
│ SERVER │
│ - Metadata (dataNodeReplicationMap) │
│ - File lock management │
│ - Heartbeat monitoring │
│ - select()-based async I/O │
└───────────┬────────────────────────────────────┬───────────────┘
│ Squid Protocol (TCP) │ Squid Protocol (TCP)
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ DATANODE 1 │ │ DATANODE 2 │
│ (file replicas) │ │ (file replicas) │
└───────────────────────┘ └───────────────────────┘
```

## Components

### Server (Central Coordinator)

The **Server** is the single source of truth for metadata. It:

- Accepts connections from Clients and DataNodes.
- Maintains the `dataNodeReplicationMap` — a mapping of each file to the DataNodes that hold a replica.
- Manages distributed **file locks** so that at most one client writes a file at a time.
- Forwards file operations (create, update, delete) to the relevant DataNodes and broadcasts changes to all connected Clients.
- Sends periodic **heartbeats** to DataNodes to detect failures.
- Calls `rebalanceFileReplication()` when a DataNode goes offline to restore the desired replication level.
- Uses the `select()` system call so it can monitor all sockets concurrently in a single thread.

More details: [Server Component](components/server.md)

### DataNode (Storage Node)

Each **DataNode** is a lightweight storage process. It:

- Stores file replicas in its working directory.
- Responds to heartbeat pings from the Server.
- Receives file content from the Server when a replication or update is needed.
- Sends file content to the Server when a Client requests a read.

More details: [DataNode Component](components/datanode.md)

### Client (GUI Application)

The **Client** provides a graphical interface backed by Dear ImGui + SDL2 + OpenGL. It:

- Opens **two TCP connections** to the Server at startup — a *primary* socket for sending commands and a *secondary* socket for receiving asynchronous updates pushed by the Server.
- Lets users create, open (read), edit (update), and delete files.
- Acquires a distributed lock before writing and releases it when done.
- Switches to **read-only mode** when disconnected to prevent inconsistencies.

More details: [Client Component](components/client.md)

## Common Shared Code

The `common/` module contains code shared by all three components:

| Module | Path | Description |
|--------|------|-------------|
| `SquidProtocol` | `common/src/squidprotocol/` | Message serialisation/deserialisation and send/receive helpers |
| `FileManager` | `common/src/filesystem/` | File CRUD operations and version tracking |
| `FileLock` | `common/src/filesystem/` | Distributed lock data structure |
| `FileTransfer` | `common/src/filesystem/` | Chunked file transfer over a socket |
| `Peer` | `common/src/peer/` | Base class with socket management and reconnection logic |

## Data Flow — File Update

```
Client Server DataNodes
│ │ │
│── AcquireLock(file) ───────▶│ │
│◀─ Response(isLocked) ───────│ │
│ │ │
│── UpdateFile(file) ────────▶│ │
│ │── UpdateFile(file) ────────▶│ (replica 1)
│ │── UpdateFile(file) ────────▶│ (replica 2)
│◀─ ACK ──────────────────────│ │
│ │ │
│── ReleaseLock(file) ───────▶│ │
│◀─ ACK ──────────────────────│ │
```

## Replication and Fault Tolerance

- Each file is replicated across `replicationFactor` DataNodes (default: 2).
- The Server tracks which DataNodes hold each file in `dataNodeReplicationMap`.
- When a DataNode goes offline (missed heartbeat), the Server removes it from the map and calls `rebalanceFileReplication()` to assign new replicas.
- Clients automatically attempt to reconnect in a loop when they lose the server connection.
- While disconnected, files are shown in **read-only mode** to preserve consistency.

## Consistency Model

Squid Storage prioritises **consistency over availability**:

- A Client can only write when it holds a distributed lock **and** is connected to the Server.
- All writes are propagated synchronously to all DataNodes before the Server acknowledges the Client.
- If a component is partitioned, it cannot modify shared state; it can only read its local copy.
65 changes: 65 additions & 0 deletions docs/components/client.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Client

The **Client** (`SquidStorage` executable) provides a graphical interface for users to interact with the distributed storage system. It is built with **Dear ImGui**, **SDL2**, and **OpenGL 2**.

## Starting the Client

```bash
./SquidStorage [serverIP] [serverPort]
```

| Argument | Default | Description |
|----------|---------|-------------|
| `serverIP` | `127.0.0.1` | IP address of the running Server |
| `serverPort` | `12345` | TCP port the Server is listening on |

## GUI Overview

The client window shows:

- **File list panel** — displays all files currently tracked by the distributed storage.
- **New File** button — creates a new empty file.
- **Open** button — reads and displays the selected file's content in an editable text area.
- **Delete** button — deletes the selected file from the distributed storage.

Editing a file automatically acquires a distributed lock before the write and releases it after the update is committed.

## Dual-Socket Connection

When the Client starts it establishes **two TCP connections** to the Server:

| Socket | Purpose |
|--------|---------|
| **Primary** | Sending commands (CreateFile, UpdateFile, DeleteFile, AcquireLock, …) |
| **Secondary** | Receiving asynchronous push notifications (file updates from other clients) |

This separation prevents command responses and asynchronous updates from interleaving on the same socket.

## Supported Operations

| Operation | Protocol Message | Description |
|-----------|-----------------|-------------|
| Create file | `CreateFile` | Sends the file content to the Server for replication |
| Read file | `ReadFile` | Retrieves a file from the Server (which fetches it from a DataNode) |
| Update file | `UpdateFile` | Sends updated content; Server propagates to DataNodes |
| Delete file | `DeleteFile` | Removes the file from all DataNodes |
| Acquire lock | `AcquireLock` | Requests an exclusive write lock from the Server |
| Release lock | `ReleaseLock` | Frees the write lock after writing |
| Sync status | `SyncStatus` | Retrieves the full list of files and their versions |

## Disconnected (Read-Only) Mode

If the Client loses its connection to the Server, it:

1. Attempts to reconnect in a loop.
2. Switches all files to **read-only** mode until reconnection succeeds.

This ensures that a partitioned client cannot write stale data and create inconsistencies.

## Technology Stack

| Library | Version | Role |
|---------|---------|------|
| **Dear ImGui** | bundled | Immediate-mode GUI rendering |
| **SDL2** | system | Window creation, input handling |
| **OpenGL 2** | system | GPU-accelerated rendering backend |
49 changes: 49 additions & 0 deletions docs/components/datanode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# DataNode

A **DataNode** (`DataNode` executable) is a storage process that holds file replicas and serves them to the Server on request. You can run as many DataNode instances as you like; the Server distributes replicas across all available nodes.

## Responsibilities

- Store file replicas in its working directory.
- Receive and persist files sent by the Server (`CreateFile`, `UpdateFile`).
- Delete files on `DeleteFile` requests from the Server.
- Respond to `Heartbeat` pings to signal availability.
- Serve file content to the Server when a Client reads a file (`ReadFile`).

## Starting a DataNode

```bash
./DataNode [serverIP] [serverPort]
```

Both arguments are optional:

| Argument | Default | Description |
|----------|---------|-------------|
| `serverIP` | `127.0.0.1` | IP address of the running Server |
| `serverPort` | `12345` | TCP port the Server is listening on |

The DataNode uses its **current working directory** as the storage root. Run each instance from a different directory so that their file stores do not overlap:

```bash
mkdir -p /var/storage/dn1 && (cd /var/storage/dn1 && /path/to/DataNode 192.168.1.10 12345)
mkdir -p /var/storage/dn2 && (cd /var/storage/dn2 && /path/to/DataNode 192.168.1.10 12345)
```

## Lifecycle

1. **Connect** — the DataNode establishes a TCP connection to the Server.
2. **Identify** — the Server sends an `Identify` request; the DataNode responds with `nodeType=DATANODE` and its `processName`.
3. **Receive files** — the Server pushes file replicas as `CreateFile` / `UpdateFile` / `DeleteFile` messages.
4. **Heartbeat** — the Server periodically sends `Heartbeat` messages; the DataNode replies with `ACK`.
5. **Reconnect** — if the connection drops, the DataNode continuously retries the connection in a loop.

## Storage Layout

Files are stored relative to the DataNode's working directory, mirroring the path used in the Squid Protocol messages. For example, a file with `filePath:/squidstorage/report.txt` is stored as `./squidstorage/report.txt` inside the DataNode's directory.

The `FileManager` component (from `common/`) tracks file versions using a `.fileVersion.txt` metadata file so that stale replicas can be detected and updated.

## Version Tracking

Each file has an integer version number. When the Server rebalances replicas, it includes the current version so the receiving DataNode can verify it has the latest content. The `SyncStatus` message allows the Server to compare file version maps across DataNodes.
98 changes: 98 additions & 0 deletions docs/components/server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Server

The **Server** (`SquidStorageServer`) is the central coordinator of Squid Storage. It manages all metadata, routes file operations between Clients and DataNodes, and monitors the health of every connected node.

## Responsibilities

- Accept incoming TCP connections from Clients and DataNodes.
- Route and propagate file operations (create, read, update, delete).
- Maintain the `dataNodeReplicationMap` — which DataNodes hold replicas of each file.
- Manage distributed **file locks** and check for lock expiry.
- Send periodic **heartbeats** to DataNodes; remove unresponsive nodes from the replication map.
- Rebalance file replicas when the actual replication count falls below the configured factor.
- Use `select()` to handle all sockets concurrently without spawning per-connection threads.

## Starting the Server

```bash
./SquidStorageServer [port] [replicationFactor] [timeoutSeconds]
```

All arguments are optional and fall back to defaults:

| Argument | Default | Description |
|----------|---------|-------------|
| `port` | `12345` | TCP port to listen on |
| `replicationFactor` | `2` | Number of DataNode replicas per file |
| `timeoutSeconds` | `60` | Socket operation timeout in seconds |

## Internal State

### `dataNodeReplicationMap`

```
map<string /*filePath*/, map<string /*datanodeName*/, SquidProtocol>>
```

Tracks, for every file, which DataNodes currently hold a replica and the corresponding protocol endpoint. When a DataNode disconnects, it is erased from this map and `rebalanceFileReplication()` is triggered.

### `clientEndpointMap`

```
map<string /*clientName*/, pair<SquidProtocol /*primary*/, SquidProtocol /*secondary*/>>
```

Each Client opens two connections to the Server. The Server stores both so it can push asynchronous updates through the secondary socket.

### `dataNodeEndpointMap`

```
map<string /*datanodeName*/, SquidProtocol>
```

Active DataNode connections indexed by name.

### `fileLockMap`

```
map<string /*filePath*/, FileLock>
```

Stores the current lock holder and lock timestamp for every file. `checkFileLockExpiration()` periodically releases stale locks (default interval: 5 minutes).

## Key Methods

| Method | Description |
|--------|-------------|
| `run()` | Main event loop — calls `select()` and dispatches incoming messages |
| `handleAccept()` | Handles a new incoming connection; performs the handshake |
| `handleConnection()` | Processes a received message from a Client or DataNode |
| `propagateCreateFile()` | Sends a `CreateFile` message to the selected DataNodes |
| `propagateUpdateFile()` | Sends an `UpdateFile` message to all DataNodes holding the file |
| `propagateDeleteFile()` | Sends a `DeleteFile` message to all DataNodes holding the file |
| `getFileFromDataNode()` | Retrieves a file from a DataNode and forwards it to a Client |
| `sendHeartbeats()` | Pings all DataNodes; removes those that do not respond |
| `rebalanceFileReplication()` | Assigns new DataNodes to a file whose replica count is below the threshold |
| `acquireLock()` / `releaseLock()` | Grants or releases a distributed write lock for a file |
| `checkFileLockExpiration()` | Periodically releases locks that have exceeded the timeout |

## Connection Handshake

When a new socket connects, the Server sends an `Identify` request. The peer responds with its `nodeType` (`CLIENT` or `DATANODE`) and its `processName`. Based on the type the Server:

- **Client**: records both the primary socket (for commands) and waits for a second connection to be registered as the secondary (update) socket.
- **DataNode**: registers the endpoint in `dataNodeEndpointMap` and starts including it in replication decisions.

## Replication Flow

1. A Client calls `CreateFile` or `UpdateFile`.
2. The Server selects `replicationFactor` DataNodes from `dataNodeEndpointMap` using round-robin.
3. The Server forwards the file to each selected DataNode via `propagateCreateFile()` / `propagateUpdateFile()`.
4. The Server updates `dataNodeReplicationMap` to record the new assignments.
5. The Server broadcasts the change to all other connected Clients.

## Fault Tolerance

- The `sendHeartbeats()` loop runs in a background thread.
- Any DataNode that fails to respond is removed from `dataNodeEndpointMap` and `dataNodeReplicationMap`.
- `rebalanceFileReplication()` is immediately invoked for every file that lost a replica, pushing a fresh copy to a healthy DataNode.
Loading
Loading