Practical Secure Aggregation for Federated Learning on User-Held Data
This paper is relatively intuitive to understand. There is no extremely long equations or proofs that took me a while to get. I found this paper while I was looking through the Google research page and it is proposing a Secure Aggregation Protocol that Federated Learning needs that operates on high-dimensional vectors, is communication efficient, robust to users dropping out, and provide the strongest possible security under constraints.
The Main Ideas
If someone is trying to create a model that will predict the next word a user will text, the first idea would be to take in all of the users text messages and train a neural network to provide the predictions. The caveat of this is that text messages most of the time contain sensitive information. The idea proposed by the researchers is to take a federated learning approach to the problem, “wherein each user maintains a private database of her text messages securely on her own mobile device, and a shared global model is trained under the coordination of a central server based upon highly processed, minimally scoped, ephemeral updates from user.”
There are five protocols that are studied and each one builds off of the previous. One thing I noticed was how computationally expensive Protocol 4 is from the table they provided. Many steps of the process including the users computation and the server communication have a cost containing . I don’t know how I feel about that but this paper is also almost 10 years old.
I decided to read this paper because although it is old, there are many things that I still can not wrap my head around. Secure aggregation and federated learning I was not familiar with and I need to get a good background of basic practices before jumping into the most recent topics.
Also, I don’t know if this is correct but reading about how federated learning works, I think it is the process apple uses to determine which emojis to suggest. When I was reading about their differential privacy implementation (not the best at all), I noticed that the idea they were focusing on almost mirrors the definition here of federated learning.
References & Sources
- Source: [https://arxiv.org/pdf/1611.04482]
- Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2016, November 14). Practical secure aggregation for federated learning on user-held data. arXiv.org. https://arxiv.org/abs/1611.04482