GI-DQA – ICML 2025

Abstract

Federated Learning enables training without exposing raw data but remains vulnerable to gradient inversion attacks. We introduce GI-DQA, a pioneering approach that reconstructs sensitive Document QA inputs, demonstrating critical vulnerabilities in multimodal models.

Key Contributions

Novel Attack

First multimodal gradient inversion attack targeting Document QA models.

Hybrid Methodology

Combines visual pixel optimization and analytical QA reconstruction.

Thorough Evaluation

Validated against Donut and LayoutLMv3 using the privacy-aware PFL-DocVQA dataset.

Methodology

GI-DQA leverages leaked gradients and public templates, analytically reconstructing QA tokens, then optimizing visual content to align with client gradients and visual priors.

Benign Client

Private document x_D + question x_Q → DQA model
Model returns answer y, sends gradients ∇θ_client

Attacker Setup

Starts from public template ẑ_D
Analytically reconstructs discrete question x_Q
Computes gradients ∇θ_adv

Optimization Loop

Gradient-matching loss L_grad(∇θ_adv, ∇θ_client)
Adds priors for smoothness, layout preservation, sharp text edges
Iteratively updates template pixels until gradients align

Outcome

Template morphs into high-fidelity replica, exposing private content

Results

Significantly outperforms existing inversion methods, recovering detailed personal data effectively.

Gradient Inversion of Multimodal Models