Skip to content

Plan of Action for Adding New Data Structures

Gagandeep Singh edited this page Dec 7, 2020 · 1 revision

We will follow the steps given below for any new data structure that is to be added,

  1. Use cases and API design - We will start with some use cases of the data structure/algorithm to be added and then some APIs will be proposed. The discussion will move from specific to generic cases and will be done in a new issue. For example the API design of disjoint set data structure on the basis of https://en.wikipedia.org/wiki/Disjoint-set_data_structure can be describe as follows,
Set
The class `Set` will represent the notion of disjoint sets with respect to the above data structure.
Attributes
1. `key` - Any valid key which can be used to uniquely identify this set.
2. `data` - Any valid data or pointers to data can be stored through this.
3. `parent` - The parent of this set. 
4. `size` - The size of the set.
5. `rank` - The rank of the set.

We cannot use static data classes as the `data` attribute will be dynamic.

DisjointSetTree
This will represent the notion of disjoint set tree of `DisjointSet`.
Attribute
`tree` - A python `dict` for `key` to `DisjointSet` mapping. Hashing here will help us quickly figure out whether a `DisjointSet` is there inside the tree.

Methods
1. `make_set(key, data)` - It will make a new set as described in the references.
2. `find(key)` - Finding the root of a particular disjoint set sub tree using path-splitting logic.
3. `union(key1, key2)` - Taking union of sets with given keys by size.

Something similar has to be done for adding new algorithms and data structures.

  1. Class design - After discussing the higher level APIs we will move on to discuss the class design. The contributor is expected to propose their abstract class design in a PR.
  2. Implementation - In this phase the implementation for the proposed data structure should be added in the PR created in the previous phase.
  3. Adding test cases - Test cases should be added and the code coverage should be maximised. If the test cases pass and the code coverage is satisfactory the PR will be merged.

The duration for each of the above phases depends on the complexity of the data structure. We recommend contributors to set a deadline for each of the phases and then work accordingly.