Firstly for my fellow degens, I have to say that Machine Learning (ML) has seen incredibly significant progress in recent years and has found many applications in various domains. Am I right? It can be used for– Image Processing, Speech Recognition, Playing the game ‘Go’ and more recently can be used to discover antibiotics.
We have so many cool applications using ML now. I mean despite these great success points there are also many issues in Machine Learning. This is where Zero Knowledge Machine Learning steps in.
Zero Knowledge Machine Learning (zkML) is you know a protocol that combines machine learning as well zero-knowledge proof (ZKP). ZKP is a cryptographic tool where one party (a prover) can prove that they can vouch for something to be true without really revealing any other information. Combining this with ML will mean that we can generate output without revealing sensitive data used in training data itself and also ensure decentralization and accuracy of computation.
As you know zkML operates by training machine learning models on data spread across different various nodes in a decentralized network. And then, these nodes can generate zero-knowledge proofs about their data. Basically later on what happens is that these proofs allow the nodes to confirm that that data is true fully without revealing the sensitive data itself.
Did not understand it? Well, let me break it down for you in this extensive article.
Here’s Machine Learning in a Nutshell
We are witnessing continued evolution in the use cases of blockchain technology in various industries. We are going to discuss one of the most exciting developments which has the goal to revolutionize these various industries in terms of data privacy (especially for sensitive data like health records).
Zero-knowledge machine learning helps us ensure proof of computational integrity rather than just trust.
Well in the beginning I mentioned that machine learning has various issues that are answered and resolved by zkML. The one I’m going to talk about in this article is the integrity issues in machine learning.
What Are the Limitations of Machine Learning?
If I had to define machine learning I will be explaining it like this: machine learning basically is the capability of machines to learn and adapt from data independently so that you know these computer models can imitate or exhibit “intelligent human behaviour”.
ML is known to be a subfield of AI (artificial intelligence) and has applications ranging from personalizing your Facebook or Instagram feed as per your activities and interactions or deciding the advertisements to flash you on Youtube or utilizing the data from your previous online purchases and search history so as to adapt and showcase you recommended products.
This was a basic explanation. Now comes a little sophisticated explanation of what is machine learning. It is important to understand what exactly it is for us to dive in and talk about its limitations.
AI has gone through various phases to be named you know “expert model systems” or “neural nets” and for us crypto degens living under the rock we need to understand what AI can do to not let it go all erratic.
Machine learning is what can I say a way to use artificial intelligence to perform complex tasks with a human touch. It has been associated with:
- Machines to understand natural humanly written and spoken language, so that these machines can not only respond back but also can translate among languages. Wait what comes to mind? Yes, similar to your iPhone Siri or home assistant (like Alexa).
- Then you know there are computer programs inspired by humans where in you know it is made up of many many simple processing units connected like cobwebs right. And then by feeding these cobweb networks with training data, these programs can learn to recognize the patterns for it to make its own inferences/predictions (yes, I’m definitely explaining neural networks).
- And ML is also related with deep learning wherein basically the machines can sift through mountains of information and figure out what’s important like how to recognize a face in a photo.
From this little explanation you might be able to figure out that machine learning has its use cases in self-driving cars or chatbots or large language models like ChatGPT or Gemini (previously Bard) or you know text-to-image applications like DALL-E or Midjourney and even really important things like deciding on loans or credit scores or even bail decisions and stuff like that.
For us uninitiated crypto geeks it is crucial to know that with machine learning models you need to trust (assume) a few things to get reliable-worthy answers/outputs from them and they are–
- Is the input data accurate and not messed with and is it private in case it ever is supposed to be?
- Does the machine learning model compute the output correctly and is someone messing with the model basically in a nutshell is it following the principles of accuracy, integrity and privacy you know?
- Lastly and obviously is the output or the answer we get accurate as per the input data and is it private if it is supposed to be?
You will now ask where the problem lies then in artificial intelligence. Well after discussing extensively on its working we can agree that it’s hard to see exactly how machine learning arrives at its answers.
This is what known researcher Daniel Kang brings about by stating that machine learning models are often hidden behind closed APIs or you know like secret codes for accessing a program.
Well to protect user privacy companies like Twitter (now dubbed as X) or OpenAI has not released the model weights like you know for instance, Elon Musk’s X has open-sourced their “For you” timeline algo but not the details of how it even happens.
By the way, another reason is that these machine learning models are like trade secrets for such companies because you know they invest a lot of time and resources to train them–which ultimately gives them a competitive edge against the other.
Let’s Breakdown The Trust and Privacy Risks with ML
When using machine learning language models like ChatGPT 3.5/4 you usually give it some information in the form of prompts and TRUST that it will give you the answer you’re browsing for without leaking this sensitive data you gave or you know without revealing any private details about the model itself.
But let me tell you there are possibilities of privacy risks during the inference process. Yes, these can be either a “Membership Inference Attack” or a “Model Inversion Attack”. What are these you ask? Let your fellow geek explain.
- So let’s say an attacker/hacker is trying to guess if YOUR medical record was used to train a health insurance machine learning model– which can be possible to figure out just by analyzing the model’s output you know.
- Or you know a malicious user could make up such very specific prompts to trick the model into revealing the bits and pieces of its training data.
This is where zero knowledge machine learning technology comes through to tackle these privacy concerns.
Zero Knowledge Proofs + Machine Learning = zkML: How Does It Address Integrity Issues
Researchers such as Yupeng Zhang who are valued in their field of research related to zero knowledge proofs, secure multiparty computations and their applications on machine learning and zkML have given us a lot of food for thought. Well specifically in one of his Monash Cybersecurity Seminars Zhang was quick to mention these machine learning concerns we previously discussed. Well you know to be exact they were:-
1. Reproducibility
2. Validity
3. Fairness
Then we are introduced to the cryptographic solution to verify/validate machine learning output without revealing the underlying model – they are Zero Knowledge Proofs (ZKPs).
ZK Proofs allow someone to prove they know something is true without revealing the details. Here in this scenario, ZK means zkSNARKS (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge). I know it is overwhelming so let’s break it down, shall we?
I was happy to find this in the research paper zk-SNARKs: A Gentle Introduction by Anca Nitulescu which I’m about to share below.
You know zk-SNARKS are a specific type of zero-knowledge proof so that we prove the integrity of output or answers from large computations. It has these qualities:-
- Succinct: zk-SNARKS provides small sizes of proofs.
- Non-Interactive: So what differentiates from traditional ZKPs is that it does not require rounds of interaction between the prover and the verifier.
- Argument: SNARKs are considered secure only for provers with limited computational resources so you know it basically means that provers with sufficient computational power can deceive the verifier with untrue statements.
Combining the SNARK system with zero knowledge leads to this cool thing of proofs to be conducted without revealing any intermediate steps or sensitive information about the computation process.
Basically what happens is that:
A prover which may be a company or a server proves a claim about secret data like a machine learning model through public computation which then later the verifier who are people like us the end users can validate without accessing the ML model itself.
Finally, We Arrive At Zero Knowledge Machine Learning
zkML is proving to be this nascent technology that everyone talks about to have improved the state of AI/ML.
When we talk about mingling the two technologies of zero knowledge and machine learning what we imply is that zKML allows the ML model to learn from the data provided without you know it being ever exposed for privacy and security reasons.
How is it done though? Well, the data is spread across various nodes in a decentralized way using which the ML models are trained. These distributed ledgers (nodes) generate zero-knowledge proofs related to their data which basically leads the nodes to validate the output/answers without revealing the details of the data itself.
Source: Research paper on “Balancing the Power of AI/ML: The Role of ZK and Blockchain”
This whole zkML process solves the underlining trust issue and generates sufficient proof that shows the result of a computation is true without disclosing the hidden inputs. This also by the way allows users like you and me to verify that a company (like OpenAI) is using the model they claim and is performing the way they advertise it to be.
Well, SevenX Ventures explains this image above in the following stages:
- Inputs and model parameters (mathematical focrmulas ML is trained on) can be hidden and the cryptographic function (i.e. hash) is published which allows for verification of the data’s authenticity.
- In the arithmetization stage, well, the ML model is converted into arithmetic circuits which ultimately verifies the model itself.
- Then, as we discussed above zk-SNARKS principles are used to generate proof that the machine learning model is functioning as it’s supposed to and more importantly by not revealing at all the underlying model or data used to train it.
- Finally, the output and proof are verified by the verifiers.
So some of the useful applications of zkML could be results like credit scoring without revealing credit bureau data, decentralized ML networks and secure off-chain model training with on-chain verification.
Practical Use Cases of zkML in the WEB3 World
We have learned that Zero-Knowledge Proofs are a potent tool and combined with machine learning techniques they solve several challenges related to privacy of data. The Web3 space which is based on blockchain technology runs on the principles of which are security and decentralization. Integration AI/ML techniques with blockchain technology can play a very significant role in the Web3 World, right?
AI & ML offer ways to enhance user experience in the decentralized space with features like personalization seamless automation and much more. However, integrating ML methods whilst maintaining the security and privacy of data in the web3 environment can be challenging. This where zkML proofs step in as a game changer.
When we talk about the application of zkML models in the Web3 world it gives a two-way benefit:
- Protecting Model Privacy: zkML ensures that the internal parameters and workings of the ML model (think of it as a complex recipe) remain hidden. In turn protecting the property of the user.
- Verifying Model Execution: You can be confident that the ML model has processed your data correctly without needing to trust the application itself. This builds transparency and trust.
Now we shall drill down into different sectors in web3 where ZK+ML works. Are you ready for it?
DeFi (Decentralized Finance) Sector
Within the Risk Assessment Area
DeFi platforms can use AI and machine learning models to accurately assess creditworthiness and prevent fraudulent activities. Blockchain technology offers a secure and transparent ledger that allows these ML models to access historical data. All this can be done with privacy when zkML is put to use.
For instance, DeFiChain platforms like DeFiChain can use zkML techniques to verify the accuracy of AI models used in fraud detection and credit risk assessment without revealing any user data. This abides by the principles of decentralization and eases permissionless credit risk assessment.
DeFiChain can also use zk-SNARKs to generate cryptographic proofs that their AI models produce accurate risk assessments. This can be done without leaking the model’s internal workings such as coefficients in logistic regression or the decision trees in random forests(methods used by them). This assures users that the models are functioning correctly.
Within the Asset Management Area
Decentralized asset management provides an opportunity for users to invest in AI-powered strategies with blockchain ensuring the secure execution of such strategies. It also provides transparent and immutable records of transactions, which increases user trust.
Modulus Labs’ RockyBot or Rockefeller bot world’s first fully on chain AI trading bot, uses AI/ML for asset management brilliantly. It uses zkSNARKs technique to prove that the bot’s trading decisions align with the predictions made by the trained RNN/LSTM (Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks) model. This is done without disclosing the specific weights or connections within the neural network, thus ensuring users that the bot is operating as intended.
Gamefi
Use of AI Agents in Games
Blockchain allows for the development of provably fair and tamper-proof games where AI agents can be used as opponents. This definitely ensures players that the AI behaves as intended and creating a more balanced and secure gaming experience you know.
Now talking about where zkML is being used in games I’d day that in games like Leela Chess Zero (by Modulus Labs) so as to verify that AI opponents haven’t been manipulated you know.
Gaming Data Analysis and In-game Personalization
Well with blockchain the thing is that this technology allows for securely storing and managing player data and it enables game developers to use AI to create personalized experiences. In fact in-game rewards and content can be made specific to individual player preferences you know based on data analysis. All while maintaining user privacy, isn’t it cool?
SocialFi
Proof of Humanity and Identity Verification
Let me tell you that SocialFi platforms can make use of blockchain in case of secure and decentralized identity verification. Yes that is right! You know this involve applying AI-powered solutions like facial or iris recognition so as to ensure user privacy through techniques like ZKPs.
To give you an example for this will be well known projects like Worldcoin that power up the concepts of zkML. How?
Well Worldcoin uses zk-SNARKs (as we have discussed it earlier) to prove that their iris recognition model has identified a unique individual, without revealing the specific features extracted from the iris or the model’s internal parameters.
Personalized Content Curation and Recommendation Systems
Oh well it couldn’t be a shocker if I say that SocialFi platforms leverage AI to personalize user feeds and recommendations. When does decentralization help here you wonder? Well blockchain can ensure the fairness and transparency of these algorithms and so it tries preventing manipulation.
What we really are here to learn is that SocialFi + zkML is combined to verify that you know recommendation algorithms (like sorting feeds) function as intended without revealing the underlying model details.
It does not end here though. This combination can be used for storing and transferring sensitive information such as financial or healthcare data you know.
Conclusion
Bringing it all home we can surely agree that there was a Cambrian explosion of information of zero knowledge machine learning. At first we began by addressing the integrity issues with ML by understanding how it works so as to introduce why was there even a need for a technology like zero knowledge proof to intervene right?
We have established that zkML ensures the generation of output/answer without revealing sensitive training data and this basically let’s us end users enjoy the fruits of decentralization and data privacy. But not just this though. We get the assurance of computational integrity and transparent environment where AI and ML applications coexist. And yes we supported this with some real life web3 applications.