Skip to content

GSoC 2024 Projects

Jusong Yu edited this page Jan 26, 2024 · 19 revisions

Getting started with AiiDA

AiiDA is a python framework for managing computational science workflows, with roots in computational materials science. It helps researchers manage large numbers of simulations (10k, 100k, 1M, ...) and complex workflows involving multiple executables. At the same time, it records the provenance of the entire simulation pipeline with the aim to make it fully reproducible.

AiiDA is used in research projects at universities, research institutes and companies (see SciPy 2020 talk, SciPy 2022 talk, publications, and testimonials).

To be considered as a GSoC student, we ask you to make a small pull request to aiida-core, or any active repositories in aiidateam and aiidalab organizations - could be a simple bug fix, improving the documentation, etc. See e.g. (for aiida-core) GitHub issues by-label

Say hi on our GSOC 2023 discussions page.

Why work on AiiDA?

  • Help accelerate the transition to open (computational) science
  • Help fix the reproducibility crisis. Computational science is a good place to start.
  • Work with a team of computational scientists (mostly physics backgrounds) who are passionate about both science and coding.
    We have an active Slack workspace & biweekly developer meetings.

A background in materials science is not needed, but a basic interest in materials science topics will make things easier for you.

Project 1 - Explore the AiiDA node graph in the browser

level *intermediate *

Expected Size 350h

AiiDA automatically stores entities in its database and links them forming a directed graph. This directed graph automatically tracks the provenance of all data produced by calculations or returned by workflows. This project plan to provide a more intuitive tool for browsing AiiDA graphs using the interactive browser. We can use an open-source library for node graph (e.g. Rete) or build it from scratch. The node graph viewer will communicate with AiiDA with the REST API.

The current AiiDA Provenance Browser (e.g. the explore website) represents the data nodes with circles, calculation nodes with squares and workflow nodes with diamond shapes. There is not much information the user can get from these nodes. Besides, when the user selects a new node, the page redirects to a new page, thus losing the smooth transition from one node to another. In the new implementation, we will create a new node component with a preview to show the basic information of the node (e.g., label, type, value). And we want only to update the nodes instead of the page when selecting a new node, thus, the user can explore the AiiDA provenance smoothly along the provenance graph.

Expected outcomes

An AiiDA node graph viewer

  • allows the user to explore the AiiDA provenance dynamically, e.g. forward and backward along the provenance graph.
  • shows input and output nodes of a selected node.
  • allows preview of the node

Skills

Python, REST API, HTML, Javascript, React, or Vue.

Mentors

Project 2 - Training LLM to generate query builder from natural language

level *advanced *

Expected Size 350h

  • AiiDA, query builder.
  • LLM/ML can help.

Expected outcomes

  • LLM to convert natural language -> query builder
  • interface for process the translate and display of the results

At the end of the project, we expect to have a lightweight tool that can run locally to generate querybuilder from the sentence input from the user. It should be a python tool that can be an option to install and integrate with AiiDA, which can be called by verdi command to generate querybuilder.

Skills

We expect you to be familiar with object-oriented programming in python. You need to have experience in natural language processing and know how to train a model from scratch.

Mentors

Mentorship

The mentors for GSOC 2023 are

Please use the GSOC 2024 discussion thread [TODO: use discourse!] to say hi and ask any questions you may have.