Privacy-Preserving
AI Technology
Our technology unlocks the full potential of data and AI while preserving privacy as data is never compromised.
“Federated Learning accelerates model development while protecting privacy.”
“Federated Learning: A managed process for combining models trained separately on separate data sets that can be used for sharing intelligence between devices, systems, or firms to overcome privacy, bandwidth, or computational limits.“
While Federated Learning is a nascent technology, it is highly promising and can enable companies to realize transformative strategic business benefits. "FL is expected to make significant strides forward and transform enterprise business outcomes responsibly.”
“Federated Learning: AI's new weapon to ensure privacy.”
“Federated Learning allows AI algorithms to travel and train on distributed data that is retained by contributors. This technique has been used to train machine-learning algorithms to detect cancer in images that are retained in the databases of various hospital systems without revealing sensitive patient data.”
Federated
learning
Federated Learning is a Machine Learning paradigm aimed at learning models from decentralized data (e.g. data located in users’ smartphones, hospitals, or banks), while ensuring data privacy.
This is achieved by training models locally at each node (e.g. at each hospital, at each bank or at each smartphone), and sharing only the model-updated local parameters and securely aggregating them to build a better global model (data never leaves the node and therefore is never shared).
After years of research, Sherpa.ai has developed the most advanced Federated Learning Platform for data privacy, which incorporates complementary Privacy-Enhancing Technologies (e.g. Differential Privacy, Homomorphic Encryption, Secure Multiparty Computation) and is making a huge impact on the academic world and industry.
TRADITIONAL SOLUTION

- Higher risk of breaching privacy.
- Not complaint with regulations.
- Data control is lost once it leaves the server.
- Large attack surface.
TRADITIONAL SOLUTION

- Unlocks the potential of collaborative models without sharing private data.
- Data privacy by design.
- Regulatory compliance - data never leaves the server of the parties involved.
- Lower risk of data breaches. The attack surface is reduced.
- Transparency about how models are trained and how the data is used.
WHEN DOES FEDERATED LEARNING BENEFIT MODEL TRAINING?
Federated Learning is disruptive in cases where it is mandatory to guarantee data privacy as data does not need to be shared.
When data contains confidential or sensitive information like Protected Health Information, financial records or any other identifiable information.
When data can’t be used or shared for regulatory reasons. This is common in heavily regulated sectors like Financial Services or Healthcare.
However, a better use of data available would make a huge impact to improve processes or solve major challenges like rare diseases.
When different organizations want to take advantage of their data without sharing it.
For example, two competitive organizations could solve a common problem through collaborative model training, but they are not willing to share proprietary data with each other for competitive reasons. Federated Learning enables collaborative model training without sharing data.
FEDERATED LEARNING GENERATIONS

THE CHALLENGE OF HETEROGENEUS DATA TRAINING
In Horizontal Federated Learning the data is homogeneous. This means that the different data sets share the same features but differ in sample size. Therefore, the same model can be shared between the parties, and it is trained collaboratively.
In the majority of real-world scenarios, this is not the case; since different nodes would typically hold heterogeneous data, which means that data differs in features. This implies that the same model can not be used and new techniques have to be developed.
Sherpa.ai allows heterogeneous data training since Vertical Federated Learning and Federated Transfer Learning are integrated in the platform.
Federated learning paradigms
HORIZONTAL FEDERATED LEARNING

Horizontal FL, it is the approach used when the data sets share the same feature space but differ in sample size. It’s used when an organization had consistent data across many locations but couldn’t for legal reason move or transfer it.
In Horizontal Federated Learning the same model can be used to train with the different data sets.
Use cases: Horizontal FL would be used in diagnosis of disease, when there isn’t enough training data in one organization and different parties are required to collaborate to develop as sufficiently accurate model.
VERTICAL FEDERATED LEARNING

Vertical Federated Learning is applied when two parties have samples with users that overlap but limited common features.
Vertical Federated Learning supports heterogeneous data, where different nodes have data sets with common users but different features. With heterogeneous data, the same model can not be used and new techniques have to be developed.
Use cases: Two different types of companies in the same area may have the same users; however, the data features held by each one differ. For example, a bank would have credit records whereas a telco would have browsing history
FEDERATED TRANSFER LEARNING

Transfer FL is used when two parties want to take advantage of each other’s data but have very few common customers or data entities. Transfer FL learns from the small common sample size and then selects additional common data entities which fit the common feature space and allows model to learn from them. Transfer FL is used in similar application areas to Vertical FL, where common users are very limited.
Use cases: Two insurance companies could improve fraud detection, training models through federated learning, so that both companies would have a highly accurate predictive algorithm, but they would not share their business data with the other party.

TWO-LAYER PRIVACY AND SECURITY SYSTEM
- Sherpa.ai has developed a two-layer privacy and security system:
- Federated Learning - Data is never shared, only parameter updates.
- Other Privacy-Enhancing Technologies (PETs) integrated - Differential Privacy (DP) at all levels, from data, parameter, and aggregator, Secure Multi-party Computation, Homomorphic Encryption, Zero Knowledge Trust among others.
- The combination of these two principles creates a two-layer privacy and security system as data is never shared but also parameter updates are protected through different Privacy-Enhancing Technologies

DATA IS NEVER SHARED
FEDERATED LEARNING
Sherpa.ai privacy-by-design platform ensures that data is never exposed.
Sherpa.ai’s Federated Learning enables AI model training without sharing data.
Only parameter updates are shared, and neither the orchestrator nor a single node can access data stored in another node.
PARAMETERS ARE PROTECTED
PRIVACY-ENCHANCING TECNOLOGIES (PETs)
Federated Learning is not enough. Therefore Sherpa.ai has developed a platform that incorporates complementary Privacy-Enhancing Technologies (Differential Privacy, Secure Multi-party Computation or Homomorphic Encryption among others) to ensure robustness of the platform.
Sherpa.ai's platform has revolutionary potential for heavily regulated sectors like Healthcare or Financial Services, where privacy as well as regulatory compliance are essential. By adding complementary technologies to ensure privacy is maintained, Sherpa.ai unlocks new scenarios of development and collaboration between organizations.
PRIVACY ENHANCING TECHNOLOGIES
(PETs)
Federated Learning is not enough. Therefore Sherpa.ai has developed a platform that incorporates complementary Privacy-Enhancing Technologies (Differential Privacy, Secure Multi-party Computation or Homomorphic Encryption among others) to ensure robustness of the platform.
Sherpa.ai's platform has revolutionary potential for heavily regulated sectors like Healthcare or Financial Services, where privacy as well as regulatory compliance are essential. By adding complementary technologies to ensure privacy is maintained, Sherpa.ai unlocks new scenarios of development and collaboration between organizations.
DIFFERENTIAL PRIVACY
Differential Privacy is a statistical technique to provide data aggregations, while avoiding the leakage of individual data records. This technique ensures that malicious agents intervening in the communication of local parameters cannot trace this information back to the data sources, adding an additional layer of data privacy.
DIFFERENTIAL PRIVACY ON TOP OF EVERYTHING
Differential Privacy on data level is the most common and limitating implementation. It does not provide a good balance between accuracy and privacy, making model training extremely complex when not impossible.
Sherpa.ai’s Differential Privacy on top of everything approach provides an empirical state-of-the-art trade-off between accuracy and privacy. With the use of Differential Privacy, we ensure that no data can be obtained by masking the original information with controlled and adaptative noise, while maintaining the performance of the predictive algorithm. This prevents malicious agents from obtaining, tracing or deducing data from the clients even with reverse engineering techniques.

AT AGGREGATOR LEVEL
Only Sherpa.ai is able to add noise at aggregation level without decreasing the model’s accuracy.
AT MODEL’S PARAMETERS LEVEL
Noise at parameter level can be added creating a partial cancellation of noise at aggregation level.
Sherpa.ai’s advanced sensitivity calculation mechanism involves a precise local analysis of the data in order to fine tune the optimal noise level to be applied.
AT DATA LEVEL
Most common and limiting implementation. It does not provide a good balance between accuracy and privacy, making model training extremely complex or impossible and damaging the nature of the data.
BLIND LEARNING (Proprietary Privacy-Enhancing Technology)
A challenge in the standard Vertical Federated Learning is to reduce the huge number of communications in a distributed scenario
Blind Learning is a fundamental functionality for Federated Learning for heterogeneous data.
With Blind Learning the number of communications are reduced by over 99%, with the following benefits:
- Lower costs
- Lower risk of data breaches which massively improves security and privacy
- Lower energy consumption and carbon footprint
HOMOMORPHIC ENCRYPTION
Homomorphic encryption is a specific class of encryption schemes that permits users to run certain operations on data while the data remains in its encrypted state. Homomorphic is a term from advanced algebra that speaks to the structure-preserving relationship between the plaintext and the encrypted data. Since the outputs of computation on encrypted data/ciphertext are identical to those of unencrypted data/plaintext, these functions may be thought of as homomorphisms.
With Sherpa.ai's Homomorphic Encryption, one can perform some computation in the cloud using data, but still preserving the privacy. By using HE you can send the encrypted version of your data to the cloud, perform the computation there and get back the encrypted result that you can decrypt later on.
All these steps don’t require the client to stay connected.
(Paramount benefit of HE). So, the main ideas about HE are:
- It is used in the aggregation of the parameters.
- The main advantages are that this aggregation is made using the encrypted parameters and the number of communications are so low in comparison with other defense techniques.

SECURE MULTIPARTY COMPUTATION
Secure multi-party computation (SMPC) is a subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private.
Unlike traditional cryptographic tasks, where cryptography assures security and integrity of communication or storage and the adversary is outside the system of participants (an eavesdropper on the sender and receiver), the cryptography in this model protects participants' privacy from each other, thus making much harder to corrupt the participants.
Sherpa.ai has developed a cryptographic protocol that distributes the computation of data from different sources to ensure that no one can view others’ data, without the need to trust a third party.
By doing this, it is ensured that your business’s sensitive data is secured, without undercutting your ability to acquire all the necessary information needed from this data.

PRIVATE ENTITY RESOLUTION
When datasets are spread across multiple organizations, the identification of the corresponding entities becomes a problem.
With the use of cutting edge cryptographic techniques, the synchronization and identification of these datasets is possible while always protecting privacy and maintaining the performance of the trained models.
Private Set Intersection (PSI) determines the intersection of samples from all parties. It aligns them by comparing hashed/encrypted identifiers (for instance, full name, ID card number … or combination of several identifiers). Our cutting-edge technology, based on the n-grams separation, can overcome typos in the identifiers. However, PSI makes identifiers of the intersection visible to all parties, which may be problematic in some cases.
PSI reveals intersection membership which is prohibited in most real-world scenarios.
Private Set Union (PSU) allows each party to keep sensitive information to itself.
PSU does not reveal the intersection membership.
ZERO-KNOWLEDGE PROOF
Zero-knowledge Proof (ZKP) is an encryption method that allows to verify specific information to another party without disclosing the information itself.
ZKP is applied to defense against privacy inference attacks in Private Set Intersection (PSI). With PSI, two organizations can compute the intersection of their encrypted data without sharing it. No content is revealed except for the elements what are part of the intersection.
OTHER
INTEGRATED TECHNOLOGIES
PERSONALIZATION
Sherpa.ai tackles the problem of skewed data in a customized way and perfectly adjusts to the uniqueness of each client using innovative techniques that preserve global learning and adapt the knowledge to each individual. This is achieved by dynamically modifying the device loss functions in each learning round so that the resulting model is unbiased towards any user.

SYNTHETIC DATA GENERATION
Synthetic data serves as a way of protecting data privacy. Many times, real data contain private and sensitive user information that cannot be freely shared. To preserve this privacy, different approaches are taken which often result in data omission which leads to an overall loss of information and utility.
Sherpa.ai’s technology makes use of advanced synthetic data generation to eliminate security loopholes such as membership. With this unconventional solution, the ability to move away from the use of standard methods is gained, which greatly reduces communication costs without degrading the accuracy of the predictive model. This generates the ability to obtain the underlying structure and show the same statistical distribution from the original data, rendering it undistinguishable from the real one.

DEFENSES AGAINST
ADVERSARIAL ATTACKS
Technical solutions have been developed to address AI-specific vulnerabilities to prevent and control attacks trying to manipulate the training dataset, inputs designed to cause the model to make a mistake, or model flaws.
DEFENSE AGAINST PRIVACY/INFERENCE ATTACKS
Federated Learning models, if not prevented, can be tricked into giving incorrect predictions and be able to give out any desired result. The process of designing an input in a specific way to obtain an incorrect result is an adversarial attack. These attacks are aimed at inferring information from the training data.
DEFENSE AGAINST
DATA ATTACKS
Technical solutions have been developed to address AI-specific vulnerabilities to prevent and control attacks trying to manipulate the training dataset, inputs designed to cause the model to make a mistake, or model flaws.
The best way to check if a defense is satisfactory is to test it with different types of attacks. Therefore, a wide range of attacks have been designed in order to verify that the models are completely private.

DEFENSE AGAINST
MEMBERSHIP INFERENCE ATTACKS
Membership Inference attacks create leakages which impair privacy preservation. Thanks to Sherpa.ai's potential in Differential Privacy, defense models capable of protecting the identity of the data have been developed. Therefore, inference attacks aiming to reveal who owns the data used to train a learning model, have been eliminated.
While at all times meeting organizational requirements and guaranteeing data privacy, in accordance with current legislation.

DEFENSE AGAINST POISONING ATTACKS
Poisoning attacks pursue to compromise the global training model- Here, malicious users inject fake training data with the aim of corrupting the learned model affecting the model’s performance and accuracy.
DEFENSE AGAINST
BYZANTINE ATTACKS
Byzantine Attacks impair the performance of the overall model and damage it until it becomes faulty. Therefore, it is crucial to make federated learning models robust to these faults where data behaves capriciously.
With Sherpa.ai’s advanced mechanisms the defence of the federated model from malicious attacks aimed at reducing the model's performance is ensured. Therefore, the protection is based on the identification of those clients with anomalous performance in order to prevent them from participating in the aggregation process.

DEFENSE AGAINST
BACKDOOR ATTACKS
The objective of these attacks is to inject a secondary task into the global model by stealth. This causes adversarial clients to be doubly targeted, and therefore the updates to the learning model differ from the updates to non-malicious clients.
Unprecedented algorithms capable of nullifying backdoor attacks have been established. With this technology, an increase of the performance and security of its models is achieved.

HOW DOES SHERPA.AI
COMPARE TO OTHER SOLUTIONS?
Sherpa.ai compares favourably with other competing technologies. We have put together a table to help you understand where Sherpa.ai is compared to other solutions in the market.

CONTACT SHERPA.AI
Maximize the value of data and AI with Sherpa.ai’s Privacy-Enhancing solutions
Contact us
QUOTES
FROM OUR TEAM
We have reached the highest levels in the implementation of algorithms for the Artificial Intelligence platform with data privacy of Sherpa.ai, with the most advanced methodologies of applied mathematics
Enrique Zuazua, Ph.D.
Senior Associate Researcher in Algorithms of Sherpa.ai
Sherpa is leading the way how artificial intelligence solutions will be built, preserving user privacy in all its forms
Tom Gruber
Chief AI Strategy Officer at Sherpa.ai