ASM Copilot

An intelligent auto completion service which helps a programmer in autocompleting code snippets. While the programmer is still typing, the pilot calculates what the user is trying to type and suggests a set of most relevant auto completion.

We design a simple analysis that extracts sequences of keywords from a large codebase, and indexs them. We then use an information retriveval technique to find the highest ranked suggestions and use them to synthesize a code completion.

Dependencies

The project has used the windows api of threading. Therefore, the project can only be compiled to windows machines that support this library.

The version of GNU compiler should be atleast 9.0 to support C++17. We have used the latest version to find support for <filesystem> standard library.

Dataset

The dataset consists of sets of code snippets of projects from different GitHub repositories. It was downloaded from https://zenodo.org/record/3472050#.YbNU1b1Bzcd.

When using any dataset, make sure that the file names do not contain any extension.

Working

Indexing

We index all files into an inverted index which is implemented through trie data structures. After filtering the file from unnecessary tokens like comments and strings, the words are inserted into the trie at the end of which lies a posting list which contains information the like document count, document frequency, number of lines of occurences in each document etc.

As mentioned above, we have used trie data structure to store the tokens instead of a linked list. The reason is simple, tries have very low searching complexity. If n is the length of string to search, then the complexity will be O(n) .

After reading the corpus and forming an index, we write it on a file to avoid indexing it again. The resulting file may look like:

Information Retrieval

There are several techniques of information retrieval, from which we have adopted the TF-IDF technique which stands for Term Frequency & Inverted Document Frequency.

TF-IDF is a numerical score used in Information Retrieval systems, which can accurately represent the relevance of a search term within a large corpus of documents. The idea is that rarer words help narrow down the search more than common words, making those documents rank higher. TF-IDF is currently the best known methodology for scoring the relevance of search terms in a set of documents

Autocompletion

We ask the user to enter a line to autocomple and use the last two words only to compute a result. The second last word is termed as the context, while the last term is the query. We first retrieve top relevant documents of both the context and the query, then we rank those documents that contain both of these tokens. Then we print only those lines in which both of these words occur together, if possible. Otherwise, we display only the lines containing the query.

Screenshots

Run Locally

Clone the project

  git clone https://github.com/saad0510/intelligent-autocompletion

Go to the project directory

  cd intelligent-autocompletion

Copy your dataset in the dataset/ directory. Make sure that there is no extension with the filenames.

Compile and run the main_index.cpp file which will index your dataset and write it on index.txt file.

    g++ main_index.cpp -o indexer.exe
    ./indexer.exe

Finally, compile and run the main.cpp file which will read the index.txt and ask you for a code line and display the results.

    g++ main.cpp -o main.exe
    ./main.exe

Documentation

For a detailed explanation of everything, refer to this report..

Last Updated

December, 2021

Authors

github : @saad0510
email : [email protected] or [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Indexer		Indexer
dataset		dataset
images		images
README.md		README.md
Report.pdf		Report.pdf
index.txt		index.txt
main.cpp		main.cpp
main_index.cpp		main_index.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASM Copilot

Dependencies

Dataset

Working

Indexing

Information Retrieval

Autocompletion

Screenshots

Run Locally

Documentation

Last Updated

Authors

About

Releases

Packages

Languages

saad0510/intelligent-autocompletion

Folders and files

Latest commit

History

Repository files navigation

ASM Copilot

Dependencies

Dataset

Working

Indexing

Information Retrieval

Autocompletion

Screenshots

Run Locally

Documentation

Last Updated

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages