Publicly Verifiable & Private Collaborative ML Model Training #6317
Replies: 4 comments 4 replies
-
Hi @ewynx, thank you for submitting the proposal! Could you elaborate further on:
|
Beta Was this translation helpful? Give feedback.
-
They are likely outdated at this point, but you might find the existing libraries on Awesome Noir helpful for shaving some development time off: https://github.com/noir-lang/awesome-noir/?tab=readme-ov-file#libraries |
Beta Was this translation helpful? Give feedback.
-
Deadline to update proposals is now extended to October 28th, 2024 if it helps buy a bit of extra time to supplement details. Good luck! |
Beta Was this translation helpful? Give feedback.
-
@Savio-Sou Thank you for the deadline extension on updating our proposal. Is there anything specific that you want us to add to the proposal itself that would clarify things beyond the questions you've already asked? |
Beta Was this translation helpful? Give feedback.
-
Summary
Verifiable & Private Collaborative Machine Learning (ML) Model Training will allow multiple parties to jointly train machine learning models while maintaining the privacy of their datasets. In addition, a collaborative zero-knowledge proof of the computation is delivered, giving public training verifiability. This approach can be applied in areas such as medical research and financial analysis, enabling collaboration without loss of privacy while maintaining compliance with regulations like GDPR.
This proposal aims to implement logistic regression in Noir and combine it with MPC to support collaborative training. Logistic regression is a widely used algorithm in Machine Learning for classification tasks. By integrating it into Noir, we will establish a fundamental building block for advancing Privacy-Preserving Machine Learning (PPML). Leveraging co-noir, we extend ZK proofs to a multi-party setting, enabling collaborative training and the collaborative generation of a proof.
The key deliverables include a fully optimized and tested logistic regression algorithm in Noir, performance benchmarks, and the open sourcing of all code and documentation to drive future development. As part of the project, we will apply the collaborative training to small datasets like the Iris plants dataset, and assess the performance and quality of the training. Based on these initial results, we will consider other datasets with more samples, features, and complexity.
Methodology
High-level strategy
There are three main components, combining them as follows:
All code deliverables will be written in Noir and can be executed with
nargo
instead ofco-noir
as well (which changes the setting from collaborative to individual).We aim to be in close contact with the Taceo team, who is actively developing co-noir, to ensure the compatibility of our implementation and the support that co-noir offers. Given that co-noir is a rapidly evolving project, we see this as an opportunity to build a compelling application that leverages this new tool while simultaneously testing it and providing valuable feedback for its continued development. Furthermore, the development of the verifiable training using Noir is useful even without MPC which makes this tool valuable on its own.
Detailed approach
After initial research, we defined the building blocks needed, reviewed available libraries from the Noir community, and determined suitable training datasets for testing. This led to the following approach (for a complete list of tasks, see Timeline and Deliverables).
For logistic regression, we need fixed point arithmetic and matrix arithmetic, including matrix inversion, and support for matrices of fixed points. The currently available libraries for the necessary arithmetic are outdated or incomplete, which makes them unsuitable for direct use.
To implement the logistic regression training, we will use the arithmetic tools presented above to fit the logistic regression model by maximum likelihood. The maximum likelihood requires the maximization of the log-likelihood function, and this problem can be solved numerically by using the Newton-Rhapson method, which will be implemented in our deliverables.
To ensure the compatibility of our functionality with
co-noir
, we will continuously test our implementation with the mentioned tool, making sure all functionality is supported in both ZK and ZK + MPC setting.As a test project, we will train a logistic regression model for the Iris plants dataset. This is a small dataset of 150 samples and three classes. We consider this a good starting point for evaluating the performance of the implementation with the ZK + MPC overhead.
To validate the accuracy of the obtained model, we will take the parameters of the trained model and measure the accuracy with a traditional fractional number representation used in libraries like
numpy
in Python. Checking the accuracy in this way gives us more confidence in the computation of the metric, given that the training is done using fixed-point, which has a lower precision compared to the built-in representation of fractional numbers in standard ML libraries.The project's initial phrase will conclude with benchmarking, testing, and documentation. Following this, we will focus on optimizations that can be done in all areas of the code, particularly in areas such as fixed point/matrix arithmetic and logistic regression itself.
Assumptions
co-noir
. Hence, we will use the protocol Rep3 (which refers to the protocol of Arraki et al. with modification presented in Eerikson et al.), and the protocol based on Shamir secret sharing. Given that co-noir currently supports a security model of honest-majority and semi-honest adversaries, we will use the same security model.Timeline and Deliverables
The project is divided into four deliverables:
numpy
.The expected timeline is as follows:
numpy
in Python & create a report.Team
The team consists of two cryptographic engineers and a project manager, all of whom have previously collaborated on Noir-related work. With their combined experience in Noir, MPC, machine learning, and working with emerging technologies like ZK and MPC, the team is well-equipped to execute this project.
Team members
Teresa Li is a Project Manager at HashCloak with a background in quantitative finance and experience in equity research, investments, and risk management. Amongst her contributions at HashCloak, she has managed the most recent Noir-related work that was delivered (see below), with the same team as proposed for this project. She will oversee this project and be the main point of contact.
Elena Fuentes (@ewynx) is a Cryptographic Researcher & Engineer at HashCloak, and has recently contributed to functionalities for noir support for zk-regex, as well as documentation and mobile benchmarking for the Noir bignum library, and docs for noir_base64 and string_search. In 2023, she won 2nd prize for Best Noir App at the ETHGlobal hackathon. Her other contributions at Hashcloak include implementing elliptic curve cryptography and SHA-512 in Sway, and working with other zero-knowledge languages and frameworks such as Plonky2, Halo2, and o1js.
Hernan Vanegas (@hdvanegasm) is a Cryptographic Researcher & Engineer at HashCloak and has extensive experience with both MPC and machine learning. He has worked with the MP-SPDZ framework to implement MPC protocols. He also applies MPC techniques to machine learning in research projects such as secure support vector machine training and dynamic programming problems like the edit distance problem. Also, he has experience with reinforcement learning techniques, and he has participated in research projects that combine information security and computer vision. Amongst his recent contributions at HashCloak, he has worked on benchmarking Noir libraries, bignum, rsa and zk-regex, as well as the assessment of threshold ECDSA protocols.
Relevant work
The following are relevant work done at HashCloak that involve Noir:
Additionally, HashCloak has been actively involved in the MPC field in the following projects:
Start Date
We would like to propose a start date of November 11th, 2024, which would allow us to begin promptly, if being selected following the announcements date.
Beta Was this translation helpful? Give feedback.
All reactions