Omri Ben Hemo*1, Alon Zolfi*1, Oryan Yehezkel1, Omer Hofman2, Roman Vainshtein2, Hisashi Kojima3, Yuval Elovici1, Asaf Shabtai1
1Ben-Gurion University of the Negev, Israel | 2Fujitsu Research of Europe | 3Fujitsu Limited, Japan
Federated Learning enables training without exposing raw data but remains vulnerable to gradient inversion attacks. We introduce GI-DQA, a pioneering approach that reconstructs sensitive Document QA inputs, demonstrating critical vulnerabilities in multimodal models.
First multimodal gradient inversion attack targeting Document QA models.
Combines visual pixel optimization and analytical QA reconstruction.
Validated against Donut and LayoutLMv3 using the privacy-aware PFL-DocVQA dataset.
GI-DQA leverages leaked gradients and public templates, analytically reconstructing QA tokens, then optimizing visual content to align with client gradients and visual priors.
Significantly outperforms existing inversion methods, recovering detailed personal data effectively.