The Distributed File System (DFS) is a project that allows for the storage, management, and retrieval of files across multiple nodes in a network. This system is designed with essential features such as file replication, load balancing, fault tolerance, and advanced consistency mechanisms to ensure data integrity and availability even in the face of failures.
- File Replication: Automatically replicate files across multiple nodes to ensure redundancy and reliability.
- Load Balancing: Evenly distribute file storage and retrieval requests across nodes to optimize performance.
- Fault Tolerance: The system is resilient to node failures, ensuring that files remain accessible.
- Consistency Mechanisms: Advanced consistency management to prevent data corruption and ensure data accuracy.
The DFS is composed of the following key components:
- Master Node: Manages the metadata, assigns tasks to worker nodes, and oversees file operations.
- Worker Nodes: Store the actual files and communicate with the master node to handle requests.
- Client: Provides the interface for users to interact with the distributed file system.
- File Upload: The client sends an upload request, the master node assigns worker nodes for storage, and the client uploads the file to those nodes.
- File Download: The client requests a file, the master node provides the locations, and the client downloads the file from the designated worker node.
- File Deletion: The client requests deletion, and the master node ensures that the file is removed from all relevant worker nodes.
- Python 3.7 or later
- pip package manager
-
Clone the Repository
git clone https://github.com/iamprabhanjan/A Simple distributed file system.git cd distributed_file_system
-
Create a Virtual Environment
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies
pip install -r requirements.txt
-
Run the System
Start the master node, worker nodes, and client as described in the Usage section.
-
Start the Master Node
python src/master_node.py
-
Start Worker Nodes
python src/worker_node.py
Run the above command in multiple terminals to simulate a distributed environment.
-
Run the Client
python src/client.py
-
Upload a File
Enter action (upload/download/delete/exit): upload Enter filename to upload: example.txt
-
Download a File
Enter action (download/delete/exit): download Enter filename to download: example.txt
-
Delete a File
Enter action (delete/exit): delete Enter filename to delete: example.txt
The project includes a comprehensive test suite to ensure all components function as expected.
-
Run all tests
python -m unittest discover tests/
-
Run a specific test
python tests/test_master_node.py
- Master Node: Tests for worker registration, file assignment, metadata management, and load balancing.
- Worker Node: Tests for file storage, retrieval, deletion, and health monitoring.
- Client: Tests for file upload, download, deletion, and consistency mechanisms.
distributed_file_system/
│
├── src/
│ ├── master_node.py # Master node logic
│ ├── worker_node.py # Worker node logic
│ └── client.py # Client logic
│
├── docs/
│ ├── README.md # Main documentation
│ ├── INSTALLATION.md # Installation guide
│ ├── USAGE.md # Usage guide
│ └── DESIGN.md # System design document
│
├── tests/
│ ├── test_master_node.py # Tests for master node
│ ├── test_worker_node.py # Tests for worker nodes
│ └── test_client.py # Tests for client
│
├── .gitignore # Files and directories to be ignored by Git
├── LICENSE # License for the project
└── requirements.txt # Python dependencies