-
Notifications
You must be signed in to change notification settings - Fork 192
GSoC 2024 Projects
AiiDA is a python framework for managing computational science workflows, with roots in computational materials science. It helps researchers manage large numbers of simulations (10k, 100k, 1M, ...) and complex workflows involving multiple executables. At the same time, it records the provenance of the entire simulation pipeline with the aim to make it fully reproducible.
AiiDA is used in research projects at universities, research institutes and companies (see SciPy 2020 talk, SciPy 2022 talk, publications, and testimonials).
To be considered as a GSoC student, we ask you to make a small pull request to aiida-core
, or any active repositories in aiidateam and aiidalab organizations - could be a simple bug fix, improving the documentation, etc. See e.g. (for aiida-core
)
Say hi on our GSOC 2024 discussions page. (TODO: link will be available when the idea list published)
- Help accelerate the transition to open (computational) science
- Help fix the reproducibility crisis. Computational science is a good place to start.
- Work with a team of computational scientists (mostly physics backgrounds) who are passionate about both science and coding.
We have an active Discourse community & biweekly developer meetings.
A background in materials science is not needed, but a basic interest in materials science topics will make things easier for you.
level intermediate
Expected Size 350h
AiiDA automatically stores entities in its database and links them forming a directed graph. This directed graph automatically tracks the provenance of all data produced by calculations or returned by workflows. This project plan to provide a more intuitive tool for browsing AiiDA graphs using the interactive browser. We can use an open-source library for node graph (e.g. Rete, react-flow or similar) or build it from scratch. The node graph viewer will communicate with AiiDA with the REST API.
The current AiiDA Provenance Browser (e.g. the explore website) represents the data nodes with circles, calculation nodes with squares and workflow nodes with diamond shapes. There is not much information the user can get from these nodes. Besides, when the user selects a new node, the page redirects to a new page, thus losing the smooth transition from one node to another. In the new implementation, we will create a new node component with a preview to show the basic information of the node (e.g., label, type, value). And we want only to update the nodes instead of the page when selecting a new node, thus, the user can explore the AiiDA provenance smoothly along the provenance graph.
An AiiDA node graph viewer
- allows the user to explore the AiiDA provenance dynamically, e.g. forward and backward along the provenance graph.
- shows input and output nodes of a selected node.
- allows preview of the node
Python, REST API, HTML, Javascript, React.
- Xing Wang @superstar54
- Kristjan Eimre @eimrek
level advanced
Expected Size 350h
One of the most powerful aspects of using AiiDA to run your workflows is that the automatically generated provenance can be used to flexibly query for the data that the user is interested in.
However, using the QueryBuilder
tool designed for this purpose can be somewhat challenging to learn and even be time consuming for experienced users.
This project aims to train a large language model (LLM) that allows users to easily request the query they are interested in by expressing their desired data in a few sentences. Existing LLM's such as ChatGPT already do a fair job at this, but the produced code is still often incorrect and outdated. Generating a diverse dataset of query prompts and corresponding Python code, and using this data to train a dedicated LLM will hopefully make the produced queries more accurate, creating a powerful tool for users to extract the results they are interested in.
At the end of the project, we aim to have a lightweight tool that can generate a correct QueryBuilder
instance from a user prompt.
This will require:
- A database that maps natural language prompts to the corresponding queries, which can be easily and incrementally expanded.
- LLM trained on this database that converts prompts into a
QueryBuilder
instance. - A user-friendly interface for that can be installed locally in the form of a Python package and optionally an online tool integrated with the Materials Cloud.
We expect you to be familiar with object-oriented programming in Python. You need to have experience in natural language processing and know how to train a model from scratch.
This project poses an exciting challenge for both students and mentors. While the AiiDA team may not have extensive experience with LLM, we eagerly anticipate students bringing their knowledge to the table. We are ready to provide expertise for the actual queries and more, making this collaboration a dynamic and enriching opportunity.
- Marnik Bercx @mbercx
- Jusong Yu @unkcpz
- Edan bainglass @edan-bainglass
If you're already familiar with AiiDA and have your own idea on how to improve it, we're happy to consider it (you may also want to check the development roadmap for further interesting project ideas). In this case, please think about the steps you would take to attack the problem and contact us in advance so that we can draw up a rough work plan.
The mentors for GSOC 2024 are
- Marnik Bercx @mbercx
- Jusong Yu @unkcpz
- Kristjan Eimre @eimrek
- Xing Wang @superstar54
- Edan bainglass @edan-bainglass
Please use the GSOC 2024 discussion thread [TODO: will available once the idea list published] to say hi and ask any questions you may have.