Skip to content

GSoC 2025 Projects

Jusong Yu edited this page Feb 9, 2025 · 10 revisions

What is AiiDA AiiDA ?

AiiDA is a python framework for managing computational science workflows, with roots in computational materials science. It helps researchers manage large numbers of simulations (10k, 100k, 1M, ...) and complex workflows involving multiple executables. At the same time, it records the provenance of the entire simulation pipeline with the aim to make it fully reproducible.

AiiDA is used in research projects at universities, research institutes and companies (see SciPy 2020 talk, SciPy 2022 talk, publications, and testimonials).

Why work on AiiDA?

  • Help accelerate the transition to open (computational) science
  • Help fix the reproducibility crisis. Computational science is a good place to start.
  • Work with a team of computational scientists (mostly physics backgrounds) who are passionate about both science and coding.
  • We have an active Discourse community & biweekly developer meetings.

A background in materials science is not needed, but a basic interest in materials science topics will make things easier for you.

Getting started

To be considered as a GSoC student, we ask you to make a small pull request to aiida-core, or any active repositories in aiidateam and aiidalab organizations - could be a simple bug fix, improving the documentation, etc. See e.g. (for aiida-core) GitHub issues by-label

Say hi on our GSOC 2025 topic on Discourse.

Project 1 - Standardize Type Annotations

level Mid

Expected Size 350h

Summary:
A significant portion of AiiDA’s source code is currently excluded from mypy checks, leading to inconsistent or missing type annotations across different modules. The goal of this project is to systematically add and refine type annotations throughout AiiDA’s codebase, ensuring that all functions, classes, and methods are correctly typed. Standardizing type annotations will improve code readability, maintainability, and reduce future bugs.

Expected Outcomes:

  • Comprehensive and consistent type annotations across all AiiDA modules.
  • Removal of unnecessary exclusions from mypy checks.
  • Improved developer experience and reduced ambiguity when extending or refactoring AiiDA’s code.

Required Skills:

  • Basic familiarity with Python.
  • Understanding of typing in Python (e.g., typing module, type hints, generics).
  • Familiarity with static type checking tools (e.g., mypy) is helpful but not strictly required, as it can be learned during the project.

Project 2 - Support Running Blocking Processes Using Multithreading

level Advanced

Expected Size 350h

AiiDA defines running jobs as AiiDA processes, which are state machines whose states are stored on the file system—allowing them to recover after reboots and continue executing remote jobs. It uses Python’s asyncio library to handle I/O-bound tasks, such as submitting jobs and waiting for remote execution. We avoid exposing the asynchronous syntax by providing only synchronous entry points when launching jobs, shielding end users from the complexities of asynchronous programming. However, synchronous Python functions within AiiDA run in a blocking manner within the event loop of main thread. Enabling multithreading would significantly boost AiiDA’s throughput by allowing these blocking processes to run concurrently.

Expected outcomes

  • Streamline event loop management.
  • Deprecate nest-asyncio in favor of greenlet.
  • Enable synchronous function execution in newly spawned threads.

Skills

  • Familiarity with Python
  • Experience with asynchronous and parallel programming

Project 3 - Training an LLM to generate a queries from natural language prompts

level Mid / Advanced

Expected Size 350h

One of the most powerful aspects of using AiiDA to run your workflows is that the automatically generated provenance can be used to flexibly query for the data that the user is interested in. However, while the QueryBuilder provides this flexibility, it can be challenging to learn and even time-consuming for experienced users.

To address this issue, this project aims to develop a tool utilizing large language models (LLMs) to generate queries from natural language prompts. Additionally, LLM-based code generation can be useful for broader AiiDA applications, so the project may optionally be extended to support general AiiDA code generation.

While existing LLMs like ChatGPT already perform fairly well, their generated code is often incorrect or outdated. By creating a diverse and maintainable dataset of query prompts and corresponding Python code, and either fine-tuning a dedicated LLM or designing carefully engineered prompts, we aim to improve the accuracy of generated queries, providing users with a powerful tool for extracting relevant results more effectively. The student will have the opportunity to actively participate in decision-making to determine the best approach.

Expected outcomes

By the end of this project, we aim to have a lightweight tool that can reliably generate a correct QueryBuilder instance from a user prompt. This will require:

  • A database that maps natural language prompts to the corresponding queries, which can be easily and incrementally expanded.
  • LLM trained on this database that converts prompts into a QueryBuilder instance, and optionally into more general AiiDA code.
  • A user-friendly interface for that can be installed locally in the form of a Python package and optionally an online tool integrated with the Materials Cloud.

Skills

  • We expect you to be familiar with object-oriented programming in Python.
  • You need to have experience in working with large language models.
  • Additionally, a first look into AI engineering is required.

Note

This project poses an exciting challenge for both students and mentors. While the AiiDA team may not have extensive experience with LLM, we eagerly anticipate students bringing their knowledge to the table. We are ready to provide expertise for the actual queries and more, making this collaboration a dynamic and enriching opportunity.

Project XX - Your Idea Here

If you're already familiar with AiiDA and have your own idea on how to improve it, we're happy to consider it (you may also want to check the development roadmap for further interesting project ideas). In this case, please think about the steps you would take to attack the problem and contact us in advance so that we can draw up a rough work plan.

Please use the GSOC 2025 topic on Discourse

GSoC 2025 related info

Clone this wiki locally