Practical Secure Aggregation for Federated Learning on User-Held Data

This paper is relatively intuitive to understand. There is no extremely long equations or proofs that took me a while to get. I found this paper while I was looking through the Google research page and it is proposing a Secure Aggregation Protocol that Federated Learning needs that operates on high-dimensional vectors, is communication efficient, robust to users dropping out, and provide the strongest possible security under constraints.

The Main Ideas

If someone is trying to create a model that will predict the next word a user will text, the first idea would be to take in all of the users text messages and train a neural network to provide the predictions. The caveat of this is that text messages most of the time contain sensitive information. The idea proposed by the researchers is to take a federated learning approach to the problem, “wherein each user maintains a private database of her text messages securely on her own mobile device, and a shared global model is trained under the coordination of a central server based upon highly processed, minimally scoped, ephemeral updates from user.”

There are five protocols that are studied and each one builds off of the previous. One thing I noticed was how computationally expensive Protocol 4 is from the table they provided. Many steps of the process including the users computation and the server communication have a cost containing . I don’t know how I feel about that but this paper is also almost 10 years old.

I decided to read this paper because although it is old, there are many things that I still can not wrap my head around. Secure aggregation and federated learning I was not familiar with and I need to get a good background of basic practices before jumping into the most recent topics.

Also, I don’t know if this is correct but reading about how federated learning works, I think it is the process apple uses to determine which emojis to suggest. When I was reading about their differential privacy implementation (not the best at all), I noticed that the idea they were focusing on almost mirrors the definition here of federated learning.

References & Sources