“Federated Learning accelerates model development while protecting privacy.”

Data Science and Machine Learning Trends You Can’t Ignore, September 2021​

Federated Learning: A managed process for combining models trained separately on separate data sets that can be used for sharing intelligence between devices, systems, or firms to overcome privacy, bandwidth, or computational limits.“

Five Key Advances Will Upgrade AI To Version 2.0 For Enterprises, February 2021​

While Federated Learning is a nascent technology, it is highly promising and can enable companies to realize transformative strategic business benefits. ​"FL is expected to make significant strides forward and transform enterprise business outcomes responsibly.”

Ritu Jyoti, group vice president, Artificial Intelligence Research at IDC.​

“Federated Learning: AI's new weapon to ensure privacy.

A little-known AI method can train on your health data without threatening your privacy, March 2019​

“Federated Learning allows AI algorithms to travel and train on distributed data that is retained by contributors. This technique has been used to train machine-learning algorithms to detect cancer in images that are retained in the databases of various hospital systems without revealing sensitive patient data.

The New Tech Tools in Data Sharing, March 2021​

Federated
learning

Federated Learning is a Machine Learning paradigm aimed at learning models from decentralized data (e.g. data located in users’ smartphones, hospitals, or banks), while ensuring data privacy.

This is achieved by training models locally at each node (e.g. at each hospital, at each bank or at each smartphone), and sharing only the model-updated local parameters and securely aggregating them to build a better global model (data never leaves the node and therefore is never shared).

After years of research, Sherpa.ai has developed the most advanced Federated Learning Platform for data privacy, which incorporates complementary Privacy-Enhancing Technologies (e.g. Differential Privacy, Homomorphic Encryption, Secure Multiparty Computation) and is making a huge impact on the academic world and industry.

TRADITIONAL SOLUTION

federated learning profile picture
  • Higher risk of breaching privacy.
  • Not complaint with regulations.
  • Data control is lost once it leaves the server.
  • Large attack surface.

SHERPA.AI FEDERATED LEARNING SOLUTION

federated learning profile picture
  • Unlocks the potential of collaborative models without sharing private data.
  • Data privacy by design.
  • Regulatory compliance - data never leaves the server of the parties involved.
  • Lower risk of data breaches. The attack surface is reduced.
  • Transparency about how models are trained and how the data is used.

WHEN DOES FEDERATED LEARNING BENEFIT MODEL TRAINING?

Federated Learning is disruptive in cases where it is mandatory to guarantee data privacy. as data does not need to be shared.

When data contains confidential or sensitive information like Protected Health Information, financial records or any other identifiable. information.

When data can’t be used or shared due to regulatory compliance as it is restricted in heavily regulated sectors like Financial Services or Healthcare.

However, a better use of data available would make a huge impact to improve processes or solve major challenges like rare diseases.

When different organizations want to take advantage of their data without sharing it.

For example, two competitive organizations could solve a common problem through a collaborative model training, but they are not willing to share propitary data data with each due to competitive reasons. Federated Learning enables collaborative model training without sharing data.

FEDERATED LEARNING GENERATIONS

schema of federated learning generations

THE CHALLENGE OF HETEROGENEUS DATA TRAINING

In Horizontal Federated Learning the data is homogeneous. This means that the data sets in the different nodes have the same features (columns) but differ in samples. Therefore, a model can be shared between the parties and it is trained collaboratively.

In the majority of real-world scenarios, this is not the case; since different nodes would typically hold heterogeneous data, which means that data differs in features. This implies that the same model can not be used and new techniques have to be developed.

Sherpa.ai is able to deal with heterogeneous data training since Vertical Federated Learning and Federated Transfer Learning is integrated within its platform.

Federated learning paradigms

FOR HOMOGENEOUS DATA

HORIZONTAL FEDERATED LEARNING​

schema of horizontal federated learning

Horizontal Federated Learning is introduced in those scenarios, where data sets share the same feature space (same type of columns) but differ in samples (different rows).

Horizontal Federated Learning only supports homogeneous data. Which is data that has the same features.
E.g. The different nodes have the same features (columns) but differ in samples. The same model can be used for both nodes

Use cases: Diagnosis of diseases.

FOR HETEROGENEOUS DATA

VERTICAL FEDERATED LEARNING​

schema of vertical federated learning

Vertical Federated Learning is applied when two parties have samples with users that overlap but limited common features.

Vertical Federated Learning supports heterogeneous data, where different nodes have data sets with common users but different features. With heterogeneous data, the same model can not be used and new techniques has to be developed.

Use cases: Two different types of companies in the same area may have the same users; however, the data features held by each one differ. For example, a bank would have credit records whereas a telco would have browsing history.

FEDERATED TRANSFER LEARNING​

schema of federated transfer learning

Federated Transfer Learning is applied when two or more parties have samples with limited users that overlap and limited common features.

The system can learn from common users in one data set and transfer de knowledge to apply it with news clients.

Use cases: Two insurance companies could improve fraud detection, training models through federated learning, so that both companies would have a highly accurate predictive algorithm, but they would not share their business data with the other party.

schema of federated coming soon
COMING SOON

PRIVACY ENHANCING TECHNOLOGIES
(PETs)

Federated Learning is not enough. Therefore Sherpa.ai has developed a platform that incorporates complementary Privacy-Enhancing Technologies (Differential Privacy, Secure Multi-party Computation or Homomorphic Encryption among others) to ensure robustness of the platform.

Sherpa.ai's platform has revolutionary potential for heavily regulated sectors like Healthcare or Financial Services, where privacy as well as regulatory compliance are essential. By adding complementary technologies to ensure privacy is maintained, Sherpa.ai unlocks new scenarios of development and collaboration between organizations.

Differential Privacy is a statistical technique to provide data aggregations, while avoiding the leakage of individual data records. This technique ensures that malicious agents intervening in the communication of local parameters cannot trace this information back to the data sources, adding an additional layer of data privacy.

DIFFERENTIAL PRIVACY ON TOP OF EVERYTHING

Differential Privacy on data level is the most common and limitating implementation. It does not provide a good balance between accuracy and privacy, making model training extremely complex when not impossible.

Sherpa.ai’s Differential Privacy on top of everything approach provides an empirical state-of-the-art trade-off between accuracy and privacy. With the use of Differential Privacy, we ensure that no data can be obtained by masking the original information with controlled and adaptative noise, while maintaining the performance of the predictive algorithm. This prevents malicious agents from obtaining, tracing or deducing data from the clients even with reverse engineering techniques.

diferencial-privacy
AT AGGREGATOR LEVEL

Only Sherpa.ai is able to add noise at aggregation level without decreasing the model’s accuracy.

AT MODEL’S PARAMETERS LEVEL

Noise at parameter level can be added creating a partial cancellation of noise at aggregation level.
Sherpa.ai’s advanced sensitivity calculation mechanism involves a precise local analysis of the data in order to fine tune the optimal noise level to be applied.

AT DATA LEVEL

Most common and limiting implementation. It does not provide a good balance between accuracy and privacy, making model training extremely complex or impossible and damaging the nature of the data.

Homomorphic encryption is a specific class of encryption schemes that permits users to run certain operations on data while the data remains in its encrypted state. Homomorphic is a term from advanced algebra that speaks to the structure-preserving relationship between the plaintext and the encrypted data. Since the outputs of computation on encrypted data/ciphertext are identical to those of unencrypted data/plaintext, these functions may be thought of as homomorphisms. Read more about homomorphic encryption.

With Sherpa.ai's Homomorphic Encryption, one can perform some computation in the cloud using data, but still preserving the privacy. By using HE you can send the encrypted version of your data to the cloud, perform the computation there and get back the encrypted result that you can decrypt later on.

All these steps don’t require the client to stay connected. (Paramount benefit of HE). So, the main ideas about HE are: It is used in the aggregation of the parameters. The main advantages are that this aggregation is made using the encrypted parameters and the number of communications are so low in comparison with other defense techniques.

hemomorphic-encryption

Secure multi-party computation (SMPC) is a subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private.

Unlike traditional cryptographic tasks, where cryptography assures security and integrity of communication or storage and the adversary is outside the system of participants (an eavesdropper on the sender and receiver), the cryptography in this model protects participants' privacy from each other, thus making much harder to corrupt the participants.

Sherpa.ai has developed a cryptographic protocol that distributes the computation of data from different sources to ensure that no one can view others’ data, without the need to trust a third party.

By doing this, it is ensured that your business’s sensitive data is secured, without undercutting your ability to acquire all the necessary information needed from this data.

schema of differential privacy

When datasets are spread across multiple organizations, the identification of the corresponding entities becomes a problem.

With the use of cutting edge cryptographic techniques, the synchronization and identification of these datasets is possible while always protecting privacy and maintaining the performance of the trained models.

Private Set Intersection (PSI) determines the intersection of samples from all parties. It aligns them by comparing hashed/encrypted identifiers (for instance, full name, ID card number … or combination of several identifiers). Our cutting-edge technology, based on the n-grams separation, can overcome typos in the identifiers. However, PSI makes identifiers of the intersection visible to all parties, which may be problematic in some cases.

PSI reveals intersection membership which is prohibited in most real-world scenarios. Private Set Union (PSU) allows each party to keep sensitive information to itself. PSU does not reveal the intersection membership.

Zero-knowledge Proof (ZKP) is an encryption method that allows to verify information specific information to another party without disclosing the information itself.

ZKP is applied to defense against privacy inference attacks in Private Set Intersection (PSI). With PSI, two organizations can compute the intersection of their encrypted data without sharing it. No content is revealed except for the elements what are part of the intersection.

OTHER INTEGRATED TECHNOLOGIES

Sherpa.ai tackles the problem of skewed data in a customized way and perfectly adjusts to the uniqueness of each client using innovative techniques that preserve global learning and adapt the knowledge to each individual. This is achieved by dynamically modifying the device loss functions in each learning round so that the resulting model is unbiased towards any user.

two silhouettes of men faced; the man on the left has thumbs up and the man of the right has thumbs down

Synthetic data serves as a way of protecting data privacy. Many times, real data contain private and sensitive user information that cannot be freely shared. To preserve this privacy, different approaches are taken which often result in data omission which leads to an overall loss of information and utility.

Sherpa.ai’s technology makes use of advanced synthetic data generation to eliminate security loopholes such as membership. With this unconventional solution, the ability to move away from the use of standard methods is gained, which greatly reduces communication costs without degrading the accuracy of the predictive model. This generates the ability to obtain the underlying structure and show the same statistical distribution from the original data, rendering it undistinguishable from the real one.

schema of synthetic data generation;  on the left side, a representation of the data of two different parties and its synthetic data created and on the right side, a three dimensional representation of the variables client, features and samples

DEFENSES AGAINST
ADVERSARIAL ATTACKS

Technical solutions have been developed to address AI-specific vulnerabilities to prevent and control attacks trying to manipulate the training dataset, inputs designed to cause the model to make a mistake, or model flaws.

Federated Learning models, if not prevented, can be tricked into giving incorrect predictions and be able to give out any desired result. The process of designing an input in a specific way to obtain an incorrect result is an adversarial attack. These attacks are aimed at inferring information from the training data.

Technical solutions have been developed to address AI-specific vulnerabilities to prevent and control attacks trying to manipulate the training dataset, inputs designed to cause the model to make a mistake, or model flaws.

The best way to check if a defense is satisfactory is to test it with different types of attacks. Therefore, a wide range of attacks have been designed in order to verify that the models are completely private.

schema of defense against data attacks

Membership Inference attacks create leakages which impair privacy preservation. Thanks to Sherpa.ai's potential in Differential Privacy, defense models capable of protecting the identity of the data have been developed. Therefore, inference attacks aiming to reveal who owns the data used to train a learning model, have been eliminated.

While at all times meeting organizational requirements and guaranteeing data privacy, in accordance with current legislation.

schema of defense against membership inference attacks

Poisoning attakcs pursue to compromise the global training model- Here, malicious users inject fake training data with the aim of corrupting the learned model. affecting the model’s performance and accuracy.

Byzantine Attacks impair the performance of the overall model and damage it until it becomes faulty. Therefore, it is crucial to make federated learning models robust to these faults where data behaves capriciously.

With Sherpa.ai’s advanced mechanisms the defence of the federated model from malicious attacks aimed at reducing the model's performance is ensured. Therefore, the protection is based on the identification of those clients with anomalous performance in order to prevent them from participating in the aggregation process.

schema of defense against byzantine attacks

The objective of these attacks is to inject a secondary task into the global model by stealth. This causes adversarial clients to be doubly targeted, and therefore the updates to the learning model differ from the updates to non-malicious clients.

Unprecedented algorithms capable of nullifying backdoor attacks have been established. With this technology, an increase of the performance and security of its models is achieved.

schema of defense against backdoor attacks

QUOTES
FROM OUR TEAM

We have reached the highest levels in the implementation of algorithms for the Artificial Intelligence platform with data privacy of Sherpa.ai, with the most advanced methodologies of applied mathematics

profile picture of enrique zuazua

Enrique Zuazua, Ph.D.

Senior Associate Researcher in Algorithms of Sherpa.ai

  • Chair Professor at FAU (Germany)
  • Alexander von Humboldt Award.
  • Considered as the world's best one in applied mathematics

Sherpa is leading the way how artificial intelligence solutions will be built, preserving user privacy in all its forms

profile picture of tom gruber

Tom Gruber

Senior Advisor in AI of Sherpa.ai

  • Co-founder and CTO of Siri
  • Head of Siri Advanced Development Group at Apple

HOW DOES SHERPA.AI
COMPARE TO OTHER SOLUTIONS?

Sherpa.ai compares favourably with other competing technologies. We have put together a table to help you understand where Sherpa.ai is compared to other solutions in the market.

how does screenshot

CONTACT SHERPA.AI

Be a first mover in AI privacy-enhancing tech.

Contact us
sherpa keynote