Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add searching algorithms for linear data structures #446

Open
EmperorArthurIX opened this issue Dec 4, 2021 · 19 comments
Open

Add searching algorithms for linear data structures #446

EmperorArthurIX opened this issue Dec 4, 2021 · 19 comments

Comments

@EmperorArthurIX
Copy link

Description of the problem

The codebase is very extensive and well-designed; however, it does not seem to include searching algorithms for linear data structures as far as I could observe. I believe addition of searching algorithms would be a good improvement to the project. Certain algorithms I have thought can be included are:

  • Linear Search(array, number)

  • Binary Search(array, number, unsorted_input=False)

  • Jump Search(array, number, unsorted_input=False)

The above mentioned declarations are suggested, where the parameter unsorted_input can be used to explicitly learn from the user whether or not the input array is sorted beforehand. If True, we must first sort the array within the search function, then search.

Comments

With certain modifications, we may even allow to return the sorted array along with search result, but I have no idea on how that will be done at the moment.

PS: I am currently working on the Radix Sort issue that is already open, and after that reaches a satisfactory stage, I wish to go on with this one. Discussions and suggestions are welcome.

@HarsheetKakar
Copy link
Contributor

With binary search and jump search in mind we can also have a SortedOneDimentionalArray as data structure which would take OneDimentionalArray as input and would dynamically choose the sorting algorithm based on how far the array is already sorted which would come from its Hamming Distance (wheather it would be insertionSort which works best for nearly sorted arrays and quickSort if the array is not sorted).

@czgdp1807
Copy link
Member

Interesting idea. I think the idea of SortedOneDimensionalArray should be discusses in a separate issue. Is there any prior work (like Wikipedia article, papers, or lecture notes available online) which can provide clarity and examples?

@HarsheetKakar
Copy link
Contributor

I have moved the conversation to Gitter once we reach a consensus I will create a separate issue.

@czgdp1807
Copy link
Member

Gitter

It's inactive. Folks use Discord these days a lot. Github issue would be the best place to discuss as it is accessible by anyone in the open source community.

@czgdp1807
Copy link
Member

And may be we should provide a function that should take an array as input and suggest one of the pydatastructs APIs as output. The time complexity should be O(n) because anything more won't be worth it. The constant in O(n) should also be very small (we can benchmark for that).

@czgdp1807 czgdp1807 changed the title To incorporate Searching Algorithms for Linear Data Structures Add searching algorithms for linear data structures Dec 5, 2021
@czgdp1807
Copy link
Member

The above mentioned declarations are suggested, where the parameter unsorted_input can be used to explicitly learn from the user whether or not the input array is sorted beforehand. If True, we must first sort the array within the search function, then search.

I think that if a searching algorithm require input array to be sorted then it should assume that. Otherwise we won't be able to guarantee lower time complexities of algorithms like binary search, exponential search. The overhead of sorting an array is non-trivial, in-fact even checking whether an array is sorted or not will take o(n) time which is large as compared to O(logn) promised by binary search. So, let's keep things simple and encapsulated for now. I would suggest something like this, search(input_array, algorithm) where algorithm would be a string (non-optional argument), binary_search, linear_search, jump_search, exponential_search, etc.

@EmperorArthurIX
Copy link
Author

The above mentioned declarations are suggested, where the parameter unsorted_input can be used to explicitly learn from the user whether or not the input array is sorted beforehand. If True, we must first sort the array within the search function, then search.

I think that if a searching algorithm require input array to be sorted then it should assume that. Otherwise we won't be able to guarantee lower time complexities of algorithms like binary search, exponential search. The overhead of sorting an array is non-trivial, in-fact even checking whether an array is sorted or not will take o(n) time which is large as compared to O(logn) promised by binary search. So, let's keep things simple and encapsulated for now. I would suggest something like this, search(input_array, algorithm) where algorithm would be a string (non-optional argument), binary_search, linear_search, jump_search, exponential_search, etc.

Noted, will think of solutions from that perspective while planning and coding.

@czgdp1807
Copy link
Member

The input should not necessarily be an array. A function should work too. An array is conceptually also a function (of indices). See this https://en.wikipedia.org/wiki/Ternary_search for example. Allowing inputs as function would make API more practical.

@r-avenous
Copy link
Contributor

r-avenous commented Dec 7, 2021

Shall I work on this one?

@czgdp1807
Copy link
Member

Sure go ahead.

@czgdp1807
Copy link
Member

@skullVoid has already opened a PR. Seems like he/she haven't linked it here yet.

@varsha080
Copy link

Hey! I can solve the issue mentioned above. Please assign the work to me. I'll write the code of all searching algo with their complexity defined.

@czgdp1807
Copy link
Member

@varsha080 Sure go ahead. We don't assign issues. Please read, https://github.com/codezonediitj/pydatastructs/wiki/Issue-Policy

@Abekaesh
Copy link
Contributor

Abekaesh commented Mar 15, 2023

Hello! I would like to work on this issue under GSoC 2023. I can write the code of the searching algorithms in an efficient manner. Shall I work on this?

@Abekaesh
Copy link
Contributor

Is this issue still open? Can I work on this?

@czgdp1807
Copy link
Member

Yes. Please go ahead.

@Abekaesh
Copy link
Contributor

Just to clarify what the problem statement, We need to implement linear search algorithm, binary search algorithm, jump search algorithm and create a common search function where its arguments are array, key, unsorted_input, algorithm where unsorted_input is a parameter necessary for binary search and jump search, and optional for linear search. If unsorted_input is true we need to return sorted array with index of key in the sorted array. In linear search unsorted_input can be given to do the same. If key not found in array, it returns -1 in position and a sorted array if unsorted_input=True.
Am I correct?

@czgdp1807
Copy link
Member

Please note #446 (comment) and go ahead with your work. Feel free to take decisions on the basis of your judgement. Thanks.

@mridul45
Copy link

I would be happy to contribute to the project as I find my skills a good fit for the issue . As a GSSOC 23 member I humbly ask to allow me to contribute to the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants