Skip to content

Implementation of distributed and parallel databases operations like fragmentation, parallel sort, range query etc.

Notifications You must be signed in to change notification settings

Prashant47/distributed-database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

distributed-database

The primary goal of the project is to implement some of key concepts in distributed and parallel databases systems. For example operations like fragmentation, parallel sort, range query etc. This project is done as part of CSE 512 Distributed and Parallel Database Systems taught by Mohamed Sarwat

These concepts are built upon open source relational database postgres. I have used python for programming and psycopg as database driver for postgres. You can find getteting started guide for psycopg here.

The project covers 3 mains concepts

  1. Data fragmentation acorss partitions. (Sharding)
  2. Query processor that accesses data from the partitioned table.
  3. Parallel sort and parallel join algorithm.

Data Fragmentation

In centralized database sysytems, all the data is present in single node whereas in distributed and parallel database systems data is paritioned into multiple nodes.

Query Processor

It involves building a simplified query processor that accesses data from the partitioned table. As part of this two queries were implemented RangeQuery() and PointQuery().
RangeQuery() takes input as range of attribute and returns the tuples that come along with given range from fragmented partitions done in first step.
PointQuery() takes input as specific value of attribute and returns all the tuples having the same value of attribute from gragmented paritions.

Parallel Sort & Join

This task involves implementation generic parallel sort and join algorithm.

Contribution

In case you like this utility or you find fun working with this project then feel free to contribute. For contributing you just need working knowledge of python, postgres & bit about distributed database concepts.
Some initial ideas would be adding few more queries in query processor .!

Issues

If you find any issue, bug, error or any unhandles exception, feel free to report one

About

Implementation of distributed and parallel databases operations like fragmentation, parallel sort, range query etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages