Skip to content
Abdulrahman Diaa

I am a Ph.D. student under the supervision of Florian Kerschbaum. I'm also a member of the Cryptography, Security, and Privacy (CrySP) lab at the University of Waterloo. I hold a Masters degree in Computer Science from the University of Waterloo and a Bachelor of Science in Computer Engineering and Mathematics from The American University in Cairo.

My research addresses privacy and traceability in machine learning. I design efficient privacy-preserving protocols that enable collaborative analytics and secure inference without exposing sensitive data, using secure multi-party computation and differential privacy. I also develop robust watermarking for AI-generated content, exposing vulnerabilities in existing schemes through adaptive attacks and building defenses that withstand adversarial removal.

Privacy-Preserving Machine Learning AI-content Watermarking
Resume
Abdulrahman Diaa

Publications

Privacy-Preserving Machine Learning
, ,

ZipPIR: High-throughput Single-server PIR without Client-side Storage

35th Usenix Security Symposium
Code

Private Information Retrieval (PIR) allows a client to privately access a database without revealing which element is accessed. Initial PIR protocols based on Ring Learning with Errors (RLWE) demonstrated the practicality of PIR, but achieve limited throughput. Alternatively, high-throughput protocols leverage an offline phase that requires substantial client-side storage (e.g., hints in SimplePIR) or involve prohibitive communication costs during the offline phase (e.g., Piano). These limitations conflict with the practical constraints of resource-limited clients and are further exacerbated by dynamic databases, where updates necessitate costly regeneration and retransmission of hints. To address these challenges, we propose ZipPIR, a high throughput PIR protocol that compresses LWE ciphertexts into significantly smaller Paillier ciphertexts. ZipPIR leverages the offline phase to obtain this size reduction without incurring the associated computational cost in the online phase. Moreover, under computational assumptions, ZipPIR features an almost silent offline phase, requiring no communication beyond an initial public key, enabling the server to independently generate and update hints during idle times without client interaction. ZipPIR achieves over 2 GB/s of throughput (comparable to state-of-the-art protocols such as SimplePIR) without the need for a large client-stored hint. For PIR overa 1 GB database, ZipPIR has up to 10x higher throughput than existing protocols with no client-side storage, while requiring less than 200 KB of server-side storage per client, significantly enhancing scalability for practical deployments. While prior PIR protocols using Paillier are very inefficient, ZipPIR is the first PIR protocol using Paillier that achieves throughgput thatis competitive with state-of-the-art PIR protocols. We discuss the use of ZipPIR in the context of certificate transparency using a new solution architecture. Our proposed solution eliminates the need for client-side storage, while enabling PIR over a more recent version of the database.

Privacy-Preserving Machine Learning
, ,

FastLloyd: Federated, Accurate, Secure, and Tunable k-Means Clustering with Differential Privacy

34th Usenix Security Symposium
Code

We study the problem of privacy-preserving k-Means clustering in the horizontally federated setting. Existing federated approaches using secure computation suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) k-Means algorithms either assume a trusted central curator or significantly degrade utility by adding noise in the local DP model. Naively combining the secure and central DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves five orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by designing a new DP clustering mechanism.

AI-content Watermarking
, ,

Optimizing Adaptive Attacks against Watermarks for Language Models

42nd International Conference on Machine Learning (ICML)
Best Poster Award, CPI Annual Conference 2024 Oral Presentation (3.9%), WMARK@ICLR 2025 Spotlight Poster (2.6%), ICML 2025
Models

Large Language Models (LLMs) can be misused to spread online spam and misinformation. Content watermarking deters misuse by hiding a message in model-generated outputs, enabling their detection using a secret watermarking key. Robustness is a core security property, stating that evading detection requires (significant) degradation of the content's quality. Many LLM watermarking methods have been proposed, but robustness is tested only against non-adaptive attackers who lack knowledge of the watermarking method and can find only suboptimal attacks. We formulate the robustness of LLM watermarking as an objective function and propose preference-based optimization to tune adaptive attacks against the specific watermarking method. Our evaluation shows that (i) adaptive attacks substantially outperform non-adaptive baselines. (ii) Even in a non-adaptive setting, adaptive attacks optimized against a few known watermarks remain highly effective when tested against other unseen watermarks, and (iii) optimization-based attacks are practical and require less than seven GPU hours. Our findings underscore the need to test robustness against adaptive attackers.

AI-content Watermarking
, , ,

Leveraging Optimization for Adaptive Attacks on Image Watermarks

12th International Conference on Learning Representations (ICLR)
Code

Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in unethical activities. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. When evaluating watermarking algorithms and their (adaptive) attacks, it is challenging to determine whether an adaptive attack is optimal, i.e., the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating surrogate keys that are differentiable and can be used to optimize the attack's parameters. We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at no visible degradation in image quality. Optimizing our attacks is efficient and requires less than 1 GPU hour to reduce the detection accuracy to 6.3% or less. Our findings emphasize the need for more rigorous robustness testing against adaptive, learnable attackers.

Privacy-Preserving Machine Learning
, , , , , , , , , , ,

Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions

33rd Usenix Security Symposium
Code

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-theart inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between 3 and 110× speedups in inference time on large models with up to 23 million parameters while maintaining competitive inference accuracy

Privacy-Preserving Machine Learning
, , , ,

Privacy-Preserving Federated Recurrent Neural Networks

23rd Proceedings on Privacy Enhancing Technologies (PoPETs)

We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-adversary threat model. We propose a packing scheme, multi-dimensional packing, for a better utilization of Single Instruction, Multiple Data (SIMD) operations under encryption. With multi-dimensional packing, RHODE enables the efficient processing, in parallel, of a batch of samples. To avoid the exploding gradients problem, RHODE provides several clipping approximations for performing gradient clipping under encryption. We experimentally show that the model performance with RHODE remains similar to non-secure solutions both for homogeneous and heterogeneous data distribution among the data holders. Our experimental evaluation shows that RHODE scales linearly with the number of data holders and the number of timesteps, sub-linearly and sub-quadratically with the number of features and the number of hidden units of RNNs, respectively. To the best of our knowledge, RHODE is the first system that provides the building blocks for the training of RNNs and its variants, under encryption in a federated learning setting.

Awards

Interop Labs PhD Fellowship

CA$110K

University of Waterloo

2026-2027

David R. Cheriton Graduate Scholarship

CA$40K

University of Waterloo

2021-2025

International Masters Award of Excellence

CA$12.5K

University of Waterloo

2021-2023

Johnston International Entrance Scholarship

CA$5K

University of Waterloo

2021-2022

Tarek Nour AUC Scholarship

US$100K

Tarek Nour Communications

2016-2021

Preprints

Privacy-Preserving Machine Learning
, ,

HE is all you need: Compressing FHE Ciphertexts using Additive HE

Presented at FHE.org Conference 2023

Homomorphic Encryption (HE) is a commonly used tool for building privacy-preserving applications. However, in scenarios with many clients and high-latency networks, communication costs due to large ciphertext sizes are the bottleneck. In this paper, we present a new compression technique that uses an additive homomorphic encryption scheme with small ciphertexts to compress large homomorphic ciphertexts based on Learning with Errors (LWE). Our technique exploits the linear step in the decryption of such ciphertexts to delegate part of the decryption to the server. We achieve compression ratios up to 90% which only requires a small compression key. By compressing multiple ciphertexts simultaneously, we can over 99% compression rate. Our compression technique can be readily applied to applications which transmit LWE ciphertexts from the server to the client as the response to a query. Furthermore, we apply our technique to private information retrieval (PIR) where a client accesses a database without revealing its query. Using our compression technique, we propose ZipPIR, a PIR protocol which achieves the lowest overall communication cost among all protocols in the literature. ZipPIR does not require any communication with the client in the preprocessing phase, making it a great solution for use cases of PIR with ephemeral clients or high-latency networks.

Academic Service

Workshop Reviewer

1st Workshop on GenAI Watermarking (WMark@ICLR)

2025

Artifact Reviewer (Ninja Award)

34th USENIX Security Symposium

2025

Journal Reviewer

Transactions on Knowledge and Data Engineering (TKDE)

2024