“Federated Learning accelerates model development while protecting privacy.”

Data Science and Machine Learning Trends You Can’t Ignore, September 2021​

Federated Learning: A managed process for combining models trained separately on separate data sets that can be used for sharing intelligence between devices, systems, or firms to overcome privacy, bandwidth, or computational limits.“

Five Key Advances Will Upgrade AI To Version 2.0 For Enterprises, February 2021​

While Federated Learning is a nascent technology, it is highly promising and can enable companies to realize transformative strategic business benefits. ​"FL is expected to make significant strides forward and transform enterprise business outcomes responsibly.”

Ritu Jyoti, group vice president, Artificial Intelligence Research at IDC.​

“Federated Learning: AI's new weapon to ensure privacy.

A little-known AI method can train on your health data without threatening your privacy, March 2019​

“Federated Learning allows AI algorithms to travel and train on distributed data that is retained by contributors. This technique has been used to train machine-learning algorithms to detect cancer in images that are retained in the databases of various hospital systems without revealing sensitive patient data.

The New Tech Tools in Data Sharing, March 2021​

Federated
learning

Federated Learning is a Machine Learning paradigm aimed at learning models from decentralized data (e.g. data located in users’ smartphones, hospitals, or banks), while ensuring data privacy.

This is achieved by training models locally at each node (e.g. at each hospital, at each bank or at each smartphone), and sharing only the model-updated local parameters and securely aggregating them to build a better global model (data never leaves the node and therefore is never shared).

After years of research, Sherpa.ai has developed the most advanced Federated Learning Platform for data privacy, which incorporates complementary Privacy-Enhancing Technologies (e.g. Differential Privacy, Homomorphic Encryption, Secure Multiparty Computation) and is making a huge impact on the academic world and industry.

TRADITIONAL SOLUTION

federated learning profile picture
  • Higher risk of breaching privacy.
  • Not complaint with regulations.
  • Data control is lost once it leaves the server.
  • Large attack surface.

SHERPA.AI FEDERATED LEARNING SOLUTION

federated learning profile picture
  • Unlocks the potential of collaborative models without sharing private data.
  • Data privacy by design.
  • Regulatory compliance - data never leaves the server of the parties involved.
  • Lower risk of data breaches. The attack surface is reduced.
  • Transparency about how models are trained and how the data is used.

WHEN DOES FEDERATED LEARNING BENEFIT MODEL TRAINING?

Federated Learning is disruptive in cases where it is mandatory to guarantee data privacy as data does not need to be shared.

When data contains confidential or sensitive information like Protected Health Information, financial records or any other identifiable information.

When data can’t be used or shared for regulatory reasons. This is common in heavily regulated sectors like Financial Services or Healthcare.

However, a better use of data available would make a huge impact to improve processes or solve major challenges like rare diseases.

When different organizations want to take advantage of their data without sharing it.

For example, two competitive organizations could solve a common problem through collaborative model training, but they are not willing to share proprietary data with each other for competitive reasons. Federated Learning enables collaborative model training without sharing data.

FEDERATED LEARNING GENERATIONS

schema of federated learning generations

THE CHALLENGE OF HETEROGENEUS DATA TRAINING

In Horizontal Federated Learning the data is homogeneous. This means that the different data sets share the same features but differ in sample size. Therefore, the same model can be shared between the parties, and it is trained collaboratively.

In the majority of real-world scenarios, this is not the case; since different nodes would typically hold heterogeneous data, which means that data differs in features. This implies that the same model can not be used and new techniques have to be developed.

Sherpa.ai allows heterogeneous data training since Vertical Federated Learning and Federated Transfer Learning are integrated in the platform.

Federated learning paradigms

FOR HOMOGENEOUS DATA
FOR HETEROGENEOUS DATA

HORIZONTAL FEDERATED LEARNING​

schema of horizontal federated learning

Horizontal FL, it is the approach used when the data sets share the same feature space but differ in sample size. It’s used when an organization had consistent data across many locations but couldn’t for legal reason move or transfer it.

In Horizontal Federated Learning the same model can be used to train with the different data sets.

Use cases: Horizontal FL would be used in diagnosis of disease, when there isn’t enough training data in one organization and different parties are required to collaborate to develop as sufficiently accurate model.

VERTICAL FEDERATED LEARNING​

schema of vertical federated learning

Vertical FL allows two parties to take advantage of each other's data without sharing it. In this case, both parties leverage the customers, users, or data entities they have in common. A use case could involve a telco and a bank, where an algorithm is trained using data from both organizations without sharing it, to improve business processes, like fraud detection or defaulter’s prediction among others.

Use cases: A use case could involve a telco and a bank, where an algorithm is trained using data from both organizations without sharing it, to improve business processes, like fraud detection or defaulter’s prediction among others.

FEDERATED TRANSFER LEARNING​

schema of federated transfer learning

Transfer FL is used when two parties want to take advantage of each other’s data but have very few common customers or data entities. Transfer FL learns from the small common sample size and then selects additional common data entities which fit the common feature space and allows model to learn from them. Transfer FL is used in similar application areas to Vertical FL, where common users are very limited.

Use cases: Two insurance companies could improve fraud detection, training models through federated learning, so that both companies would have a highly accurate predictive algorithm, but they would not share their business data with the other party.

schema of federated coming soon
COMING SOON

TWO-LAYER PRIVACY AND SECURITY SYSTEM

  • Sherpa.ai has developed a two-layer privacy and security system:
    • Data is never shared – Sherpa.ai privacy-by-design platform ensures that data is never exposed. Only parameter updates are shared, and neither the orchestrator nor a single node can access data stored in another node.
    • Other Privacy-Enhancing Technologies (PETs) integrated – In the case of heavily regulated sectors like Financial Services or Healthcare, FL is not always enough to meet privacy and security compliance requirements. To meet said requirements, Sherpa.ai’s platform integrates with other PETs. Sherpa.ai applies Differential Privacy (DP) at all levels, from data, parameter, and aggregator, which increases the model’s accuracy while preserving privacy and security. Other technologies such as Secure Multi-party Computation, Homomorphic Encryption, Zero Knowledge Trust are also integrated to defend against poisoning, data, adversarial or inference attacks.
  • The combination of these two principles creates a two-layer privacy and security system as data is never shared but also parameter updates are protected through different Privacy-Enhancing Technologies
    two layers

    DATA IS NEVER SHARED
    FEDERATED LEARNING

    Federated Learning is not enough. Therefore Sherpa.ai has developed a platform that incorporates complementary Privacy-Enhancing Technologies (Differential Privacy, Secure Multi-party Computation or Homomorphic Encryption among others) to ensure robustness of the platform.

    Sherpa.ai's platform has revolutionary potential for heavily regulated sectors like Healthcare or Financial Services, where privacy as well as regulatory compliance are essential. By adding complementary technologies to ensure privacy is maintained, Sherpa.ai unlocks new scenarios of development and collaboration between organizations.

    PARAMETERS ARE PROTECTED
    PRIVACY-ENCHANCING TECNOLOGIES (PETs)

    Federated Learning is not enough. Therefore Sherpa.ai has developed a platform that incorporates complementary Privacy-Enhancing Technologies (Differential Privacy, Secure Multi-party Computation or Homomorphic Encryption among others) to ensure robustness of the platform.

    Sherpa.ai's platform has revolutionary potential for heavily regulated sectors like Healthcare or Financial Services, where privacy as well as regulatory compliance are essential. By adding complementary technologies to ensure privacy is maintained, Sherpa.ai unlocks new scenarios of development and collaboration between organizations.

    PRIVACY ENHANCING TECHNOLOGIES
    (PETs)

    Federated Learning is not enough. Therefore Sherpa.ai has developed a platform that incorporates complementary Privacy-Enhancing Technologies (Differential Privacy, Secure Multi-party Computation or Homomorphic Encryption among others) to ensure robustness of the platform.

    Sherpa.ai's platform has revolutionary potential for heavily regulated sectors like Healthcare or Financial Services, where privacy as well as regulatory compliance are essential. By adding complementary technologies to ensure privacy is maintained, Sherpa.ai unlocks new scenarios of development and collaboration between organizations.

    Differential Privacy is a statistical technique to provide data aggregations, while avoiding the leakage of individual data records. This technique ensures that malicious agents intervening in the communication of local parameters cannot trace this information back to the data sources, adding an additional layer of data privacy.

    DIFFERENTIAL PRIVACY ON TOP OF EVERYTHING

    Differential Privacy on data level is the most common and limitating implementation. It does not provide a good balance between accuracy and privacy, making model training extremely complex when not impossible.

    Sherpa.ai’s Differential Privacy on top of everything approach provides an empirical state-of-the-art trade-off between accuracy and privacy. With the use of Differential Privacy, we ensure that no data can be obtained by masking the original information with controlled and adaptative noise, while maintaining the performance of the predictive algorithm. This prevents malicious agents from obtaining, tracing or deducing data from the clients even with reverse engineering techniques.

    diferencial-privacy
    AT AGGREGATOR LEVEL

    Only Sherpa.ai is able to add noise at aggregation level without decreasing the model’s accuracy.

    AT MODEL’S PARAMETERS LEVEL

    Noise at parameter level can be added creating a partial cancellation of noise at aggregation level.
    Sherpa.ai’s advanced sensitivity calculation mechanism involves a precise local analysis of the data in order to fine tune the optimal noise level to be applied.

    AT DATA LEVEL

    Most common and limiting implementation. It does not provide a good balance between accuracy and privacy, making model training extremely complex or impossible and damaging the nature of the data.

    A challenge in the standard Vertical Federated Learning is to reduce the huge number of communications in a distributed scenario

    Blind Learning is a fundamental functionality for Federated Learning for heterogeneous data.

    With Blind Learning the number of communications are reduced by over 99%, with the following benefits:

    • Lower costs
    • Lower risk of data breaches which massively improves security and privacy
    • Lower energy consumption and carbon footprint

      Homomorphic encryption is a specific class of encryption schemes that permits users to run certain operations on data while the data remains in its encrypted state. Homomorphic is a term from advanced algebra that speaks to the structure-preserving relationship between the plaintext and the encrypted data. Since the outputs of computation on encrypted data/ciphertext are identical to those of unencrypted data/plaintext, these functions may be thought of as homomorphisms.

      With Sherpa.ai's Homomorphic Encryption, one can perform some computation in the cloud using data, but still preserving the privacy. By using HE you can send the encrypted version of your data to the cloud, perform the computation there and get back the encrypted result that you can decrypt later on.

      All these steps don’t require the client to stay connected.
      (Paramount benefit of HE). So, the main ideas about HE are:

      • It is used in the aggregation of the parameters.
      • The main advantages are that this aggregation is made using the encrypted parameters and the number of communications are so low in comparison with other defense techniques.

      hemomorphic-encryption

      Secure multi-party computation (SMPC) is a subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private.

      Unlike traditional cryptographic tasks, where cryptography assures security and integrity of communication or storage and the adversary is outside the system of participants (an eavesdropper on the sender and receiver), the cryptography in this model protects participants' privacy from each other, thus making much harder to corrupt the participants.

      Sherpa.ai has developed a cryptographic protocol that distributes the computation of data from different sources to ensure that no one can view others’ data, without the need to trust a third party.

      By doing this, it is ensured that your business’s sensitive data is secured, without undercutting your ability to acquire all the necessary information needed from this data.

      When datasets are spread across multiple organizations, the identification of the corresponding entities becomes a problem.

      With the use of cutting edge cryptographic techniques, the synchronization and identification of these datasets is possible while always protecting privacy and maintaining the performance of the trained models.

      Private Set Intersection (PSI) determines the intersection of samples from all parties. It aligns them by comparing hashed/encrypted identifiers (for instance, full name, ID card number … or combination of several identifiers). Our cutting-edge technology, based on the n-grams separation, can overcome typos in the identifiers. However, PSI makes identifiers of the intersection visible to all parties, which may be problematic in some cases.

      PSI reveals intersection membership which is prohibited in most real-world scenarios.
      Private Set Union (PSU) allows each party to keep sensitive information to itself.
      PSU does not reveal the intersection membership.

      Zero-knowledge Proof (ZKP) is an encryption method that allows to verify specific information to another party without disclosing the information itself.

      ZKP is applied to defense against privacy inference attacks in Private Set Intersection (PSI). With PSI, two organizations can compute the intersection of their encrypted data without sharing it. No content is revealed except for the elements what are part of the intersection.

      OTHER INTEGRATED TECHNOLOGIES

      Sherpa.ai tackles the problem of skewed data in a customized way and perfectly adjusts to the uniqueness of each client using innovative techniques that preserve global learning and adapt the knowledge to each individual. This is achieved by dynamically modifying the device loss functions in each learning round so that the resulting model is unbiased towards any user.

      two silhouettes of men faced; the man on the left has thumbs up and the man of the right has thumbs down

      Synthetic data serves as a way of protecting data privacy. Many times, real data contain private and sensitive user information that cannot be freely shared. To preserve this privacy, different approaches are taken which often result in data omission which leads to an overall loss of information and utility.

      Sherpa.ai’s technology makes use of advanced synthetic data generation to eliminate security loopholes such as membership. With this unconventional solution, the ability to move away from the use of standard methods is gained, which greatly reduces communication costs without degrading the accuracy of the predictive model. This generates the ability to obtain the underlying structure and show the same statistical distribution from the original data, rendering it undistinguishable from the real one.

      schema of synthetic data generation;  on the left side, a representation of the data of two different parties and its synthetic data created and on the right side, a three dimensional representation of the variables client, features and samples

      DEFENSES AGAINST
      ADVERSARIAL ATTACKS

      Technical solutions have been developed to address AI-specific vulnerabilities to prevent and control attacks trying to manipulate the training dataset, inputs designed to cause the model to make a mistake, or model flaws.

      Federated Learning models, if not prevented, can be tricked into giving incorrect predictions and be able to give out any desired result. The process of designing an input in a specific way to obtain an incorrect result is an adversarial attack. These attacks are aimed at inferring information from the training data.

      Technical solutions have been developed to address AI-specific vulnerabilities to prevent and control attacks trying to manipulate the training dataset, inputs designed to cause the model to make a mistake, or model flaws.

      The best way to check if a defense is satisfactory is to test it with different types of attacks. Therefore, a wide range of attacks have been designed in order to verify that the models are completely private.

      schema of defense against data attacks

      Membership Inference attacks create leakages which impair privacy preservation. Thanks to Sherpa.ai's potential in Differential Privacy, defense models capable of protecting the identity of the data have been developed. Therefore, inference attacks aiming to reveal who owns the data used to train a learning model, have been eliminated.

      While at all times meeting organizational requirements and guaranteeing data privacy, in accordance with current legislation.

      schema of defense against membership inference attacks

      Poisoning attacks pursue to compromise the global training model- Here, malicious users inject fake training data with the aim of corrupting the learned model affecting the model’s performance and accuracy.

      Byzantine Attacks impair the performance of the overall model and damage it until it becomes faulty. Therefore, it is crucial to make federated learning models robust to these faults where data behaves capriciously.

      With Sherpa.ai’s advanced mechanisms the defence of the federated model from malicious attacks aimed at reducing the model's performance is ensured. Therefore, the protection is based on the identification of those clients with anomalous performance in order to prevent them from participating in the aggregation process.

      schema of defense against byzantine attacks

      The objective of these attacks is to inject a secondary task into the global model by stealth. This causes adversarial clients to be doubly targeted, and therefore the updates to the learning model differ from the updates to non-malicious clients.

      Unprecedented algorithms capable of nullifying backdoor attacks have been established. With this technology, an increase of the performance and security of its models is achieved.

      schema of defense against backdoor attacks

      QUOTES
      FROM OUR TEAM

      We have reached the highest levels in the implementation of algorithms for the Artificial Intelligence platform with data privacy of Sherpa.ai, with the most advanced methodologies of applied mathematics

      profile picture of enrique zuazua

      Enrique Zuazua, Ph.D.

      Senior Associate Researcher in Algorithms of Sherpa.ai

      • Chair Professor at FAU (Germany)
      • Alexander von Humboldt Award.
      • Considered as the world's best one in applied mathematics

      Sherpa is leading the way how artificial intelligence solutions will be built, preserving user privacy in all its forms

      profile picture of tom gruber

      Tom Gruber

      Chief AI Strategy Officer at Sherpa.ai

      • Co-founder and CTO of Siri
      • Head of Siri Advanced Development Group at Apple

      HOW DOES SHERPA.AI
      COMPARE TO OTHER SOLUTIONS?

      Sherpa.ai compares favourably with other competing technologies. We have put together a table to help you understand where Sherpa.ai is compared to other solutions in the market.

      how does screenshot

      CONTACT SHERPA.AI

      Maximize the value of data and AI with Sherpa.ai’s Privacy-Enhancing solutions

      Contact us
      sherpa keynote