Publishing Neural Networks in Drug Discovery Might Compromise Training Data Privacy

Krüger, Fabian P.; Östman, Johan; Mervin, Lewis; Tetko, Igor V.; Engkvist, Ola

Computer Science > Cryptography and Security

arXiv:2410.16975 (cs)

[Submitted on 22 Oct 2024]

Title:Publishing Neural Networks in Drug Discovery Might Compromise Training Data Privacy

Authors:Fabian P. Krüger, Johan Östman, Lewis Mervin, Igor V. Tetko, Ola Engkvist

View PDF HTML (experimental)

Abstract:This study investigates the risks of exposing confidential chemical structures when machine learning models trained on these structures are made publicly available. We use membership inference attacks, a common method to assess privacy that is largely unexplored in the context of drug discovery, to examine neural networks for molecular property prediction in a black-box setting. Our results reveal significant privacy risks across all evaluated datasets and neural network architectures. Combining multiple attacks increases these risks. Molecules from minority classes, often the most valuable in drug discovery, are particularly vulnerable. We also found that representing molecules as graphs and using message-passing neural networks may mitigate these risks. We provide a framework to assess privacy risks of classification models and molecular representations. Our findings highlight the need for careful consideration when sharing neural networks trained on proprietary chemical structures, informing organisations and researchers about the trade-offs between data confidentiality and model openness.

Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2410.16975 [cs.CR]
	(or arXiv:2410.16975v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2410.16975

Submission history

From: Fabian Krüger [view email]
[v1] Tue, 22 Oct 2024 12:55:02 UTC (919 KB)

Full-text links:

Access Paper:

view license

Ancillary-file links:

Ancillary files (details):

Paper_Privacy_risks_Supplementary_Information.pdf

Current browse context:

cs.CR

< prev | next >

new | recent | 2024-10

Change to browse by:

cs
cs.LG

References & Citations

export BibTeX citation

Computer Science > Cryptography and Security

Title:Publishing Neural Networks in Drug Discovery Might Compromise Training Data Privacy

Submission history

Access Paper:

Ancillary files (details):

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Publishing Neural Networks in Drug Discovery Might Compromise Training Data Privacy

Submission history

Access Paper:

Ancillary files (details):

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators