Abstract
We present a stealthy privacy attack that exposes links in Graph Neural Networks (GNNs). Focusing on dynamic GNNs, we propose to inject new nodes and attach them to a particular target node to infer its private edge information. Our approach significantly enhances the \(F_1\) score of the attack compared to the current state-of-the-art benchmarks. Specifically, for the Twitch dataset, our method improves the \(F_1\) score by 23.75%, and for the Flickr dataset, remarkably, it is more than three times better than the state-of-the-art. We also propose and evaluate defense strategies based on differentially private (DP) mechanisms relying on a newly defined DP notion. These solutions, on average, reduce the effectiveness of the attack by 71.9% while only incurring a minimal utility loss of about 3.2%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The post-processing property allows arbitrary data-independent transformations to DP outputs without affecting their privacy guarantee [27].
References
Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: 30th International Conference on Neural Information Processing Systems (NIPS) (2016)
Brunet, S., Canard, S., Gambs, S., Olivier, B.: Novel differentially private mechanisms for graphs. IACR Cryptology ePrint Archive, p. 745 (2016)
Cadwalladr, C., Graham-Harrison, E.: Revealed: 50 million Facebook profiles harvested for Cambridge analytica in major data breach. The Guardian (2018)
Duddu, V., Boutet, A., Shejwalkar, V.: Quantifying privacy leakage in graph embedding. In: 17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous) (2021)
Dwork, C.: Differential privacy. In: 33rd International Colloquium on Automata, Languages and Programming, ICALP (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography (2006)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: IEEE International Conference on Data Mining (2009)
He, X., Jia, J., Backes, M., Gong, N.Z., Zhang, Y.: Stealing links from graph neural networks. In: USENIX Security Symposium (2021)
Karwa, V., Slavkovic, A.B.: Differentially private graphical degree sequences and synthetic graphs. In: Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD (2012)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations (ICLR) (2017)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations (ICLR) (2017)
Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)
Li, K., et al.: Towards practical edge inference attacks against graph neural networks. In: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 (2023)
Mueller, T.T., Usynin, D., Paetzold, J.C., Rueckert, D., Kaissis, G.: SoK: differential privacy on graph-structured data. CoRR abs/2203.09205 (2022)
Rozemberczki, B., Sarkar, R.: Twitch gamers: a dataset for evaluating proximity preserving and structural role-based node embeddings. arXiv preprint arXiv:2101.03091 (2021)
Sajadmanesh, S., Shamsabadi, A.S., Bellet, A., Gatica-Perez, D.: GAP: differentially private graph neural networks with aggregation perturbation. CoRR abs/2203.00949 (2022)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Mag. 29(3), 93 (2008)
Sun, L., et al.: Adversarial attack and defense on graph data: a survey. arXiv preprint arXiv:1812.10528 (2018)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: 6th International Conference on Learning Representations (ICLR) (2018)
Wu, B., Yang, X., Pan, S., Yuan, X.: Adapting membership inference attacks to GNN for graph classification: approaches and implications. In: IEEE International Conference on Data Mining (ICDM) (2021)
Wu, F., Long, Y., Zhang, C., Li, B.: Linkteller: recovering private edges from graph neural networks via influence analysis. In: 2022 IEEE Symposium on Security and Privacy (SP) (2022)
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: 7th International Conference on Learning Representations (ICLR) (2019)
Zeng, H., Zhou, H., Srivastava, A., Kannan, R., Prasanna, V.K.: Graphsaint: graph sampling based inductive learning method. In: 8th International Conference on Learning Representations (ICLR) (2020)
Zhang, M., Chen, Y.: Link prediction based on graph neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Zhu, K., Fioretto, F., Van Hentenryck, P.: Post-processing of differentially private data: a fairness perspective. In: 31st International Joint Conference on Artificial Intelligence, IJCAI (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Differential Privacy
The original definition of Differential Privacy (DP) [5, 7] was introduced in the context of microdata, i.e., databases containing individual records. A central aspect of DP is the concept of neighborhood, originally defined for that data structure.
Definition 3
(Neighboring Databases). Let \(\mathcal {D}\) be the class of possible databases. Any two databases \(D, D' \in \mathcal {D}\) that differ in one record are called neighbors. For two neighboring databases, \(d(D, D') = 1\), where d denotes the Hamming distance.
Definition 4
(\((\varepsilon , \delta )\)-Differential Privacy [5, 7]). A randomized mechanism \(\mathcal {M}\) satisfies \((\varepsilon ,\delta )\)-DP with \(\varepsilon , \delta \ge 0\) if, for all pairs of neighboring databases \(D, D' \in \mathcal {D}\) and for all measurable \(\mathcal {O} \subseteq \text {Range}(\mathcal {M})\),
In simple terms, the output of a DP mechanism should not reveal the presence or absence of any specific record in the database, up to an exponential factor \(\varepsilon \) and additional \(\delta \). When each record corresponds to an individual, DP ensures their information remains confidential. A lower \(\varepsilon \), known as the privacy budget, provides stronger protection.
The most popular DP mechanism is the Laplace mechanism, which relies on global sensitivity, defined as follows:
Definition 5
(Global Sensitivity [7]). The \(L_p\)-global sensitivity of a query function \(f: \mathcal {D} \rightarrow \mathbb {R}^d\) is defined as
where \(D, D'\) are any two neighboring databases.
Definition 6
(Laplace Mechanism [7]). Given any function \(f: \mathcal {D} \rightarrow \mathbb {R}^d\), the Laplace mechanism is defined as:
where \(Y_i\) are i.i.d. random variables drawn from a Laplace distribution with zero mean and scale \(\Delta _1(f)/\varepsilon \).
1.2 A.2 Experimental Setup
Datasets. We evaluated our attack on several real-world datasets used in related research. We include the Flickr dataset [25], where nodes represent images and edges connect nodes with shared properties. Node features contain word representations. We also use two Twitch datasets (TWITCH-FR and TWITCH-RU) [16] to evaluate NILS and Twitch-ES for training GNNs in an inductive setting, as done in [23]. Twitch datasets map follow connections between users and aim to classify if a streamer uses explicit language, using features like preferred games and location. For the transductive setting, where training and testing occur on the same graph, we use three citation network datasets: Cora, Citeseer, and Pubmed [19]. These involve predicting the topic of publications based on textual features and citation relationships.
Models. We follow the approach in [23] for training models and selecting hyperparameters. The authors trained Graph Convolutional Networks (GCNs) using various configurations, including normalization techniques, the number of hidden layers, input and output units, and dropout rates. A grid search strategy identified optimal hyperparameters, evaluating performance on a validation set. By using the same training procedures and hyperparameter tuning strategies, we ensure consistency across studies.
Evaluation Metrics. We employ precision, recall, and the \(F_1\) score as our primary evaluation metrics, following [23]. These metrics are suitable for addressing the imbalanced binary classification problem, where the minority class (connected nodes) is of central interest. We select target nodes \(V_{\mathcal {A}}\) such that \(|V_{\mathcal {A}}|=500\), using uniform random sampling. We explore scenarios where target nodes exhibit either low or high degrees, as discussed in [23, Section V.D.]. Results are averaged over three runs with different random seeds, along with the corresponding standard deviation.
1.3 A.3 Limitations: Depth of the GNN
We examine the impact of increasing the GNN depth on the success rate of the attack for the Twitch-FR dataset. Our findings in Fig. 4 show that as GNN depth increases, the attack’s success rate decreases. This reduction is due to the dilution of the injected node’s influence within the target node’s neighborhood. As GNN depth increases, the model aggregates information from a larger neighborhood, diluting the influence of the injected malicious node and diminishing the attack’s effectiveness.
Compared to LinkTeller [23], as shown in Table 4, NILS outperforms LinkTeller across various GCN depths. For the Twitch-FR dataset, NILS demonstrates higher precision and recall values at GCN depth 3 (precision: \(85.1 \pm \scriptstyle 1.2\), recall: \(81.6 \pm \scriptstyle 1.2\)) compared to LinkTeller at depth 2 (precision: \(84.1 \pm \scriptstyle 3.7\), recall: \(78.2 \pm \scriptstyle 1.9\)). These results highlight the effectiveness of our node injection strategy, consistently outperforming LinkTeller across different GCN depths.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zari, O., Parra-Arnau, J., Ünsal, A., Önen, M. (2024). Node Injection Link Stealing Attack. In: Domingo-Ferrer, J., Önen, M. (eds) Privacy in Statistical Databases. PSD 2024. Lecture Notes in Computer Science, vol 14915. Springer, Cham. https://doi.org/10.1007/978-3-031-69651-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-69651-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69650-3
Online ISBN: 978-3-031-69651-0
eBook Packages: Computer ScienceComputer Science (R0)