Node Injection Link Stealing Attack | SpringerLink
Skip to main content

Node Injection Link Stealing Attack

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14915))

Included in the following conference series:

  • 318 Accesses

Abstract

We present a stealthy privacy attack that exposes links in Graph Neural Networks (GNNs). Focusing on dynamic GNNs, we propose to inject new nodes and attach them to a particular target node to infer its private edge information. Our approach significantly enhances the \(F_1\) score of the attack compared to the current state-of-the-art benchmarks. Specifically, for the Twitch dataset, our method improves the \(F_1\) score by 23.75%, and for the Flickr dataset, remarkably, it is more than three times better than the state-of-the-art. We also propose and evaluate defense strategies based on differentially private (DP) mechanisms relying on a newly defined DP notion. These solutions, on average, reduce the effectiveness of the attack by 71.9% while only incurring a minimal utility loss of about 3.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The post-processing property allows arbitrary data-independent transformations to DP outputs without affecting their privacy guarantee [27].

References

  1. Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: 30th International Conference on Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  2. Brunet, S., Canard, S., Gambs, S., Olivier, B.: Novel differentially private mechanisms for graphs. IACR Cryptology ePrint Archive, p. 745 (2016)

    Google Scholar 

  3. Cadwalladr, C., Graham-Harrison, E.: Revealed: 50 million Facebook profiles harvested for Cambridge analytica in major data breach. The Guardian (2018)

    Google Scholar 

  4. Duddu, V., Boutet, A., Shejwalkar, V.: Quantifying privacy leakage in graph embedding. In: 17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous) (2021)

    Google Scholar 

  5. Dwork, C.: Differential privacy. In: 33rd International Colloquium on Automata, Languages and Programming, ICALP (2006)

    Google Scholar 

  6. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography (2006)

    Google Scholar 

  7. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  Google Scholar 

  8. Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: IEEE International Conference on Data Mining (2009)

    Google Scholar 

  9. He, X., Jia, J., Backes, M., Gong, N.Z., Zhang, Y.: Stealing links from graph neural networks. In: USENIX Security Symposium (2021)

    Google Scholar 

  10. Karwa, V., Slavkovic, A.B.: Differentially private graphical degree sequences and synthetic graphs. In: Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD (2012)

    Google Scholar 

  11. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  12. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  13. Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)

    Article  MathSciNet  Google Scholar 

  14. Li, K., et al.: Towards practical edge inference attacks against graph neural networks. In: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 (2023)

    Google Scholar 

  15. Mueller, T.T., Usynin, D., Paetzold, J.C., Rueckert, D., Kaissis, G.: SoK: differential privacy on graph-structured data. CoRR abs/2203.09205 (2022)

    Google Scholar 

  16. Rozemberczki, B., Sarkar, R.: Twitch gamers: a dataset for evaluating proximity preserving and structural role-based node embeddings. arXiv preprint arXiv:2101.03091 (2021)

  17. Sajadmanesh, S., Shamsabadi, A.S., Bellet, A., Gatica-Perez, D.: GAP: differentially private graph neural networks with aggregation perturbation. CoRR abs/2203.00949 (2022)

    Google Scholar 

  18. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009)

    Article  Google Scholar 

  19. Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Mag. 29(3), 93 (2008)

    Google Scholar 

  20. Sun, L., et al.: Adversarial attack and defense on graph data: a survey. arXiv preprint arXiv:1812.10528 (2018)

  21. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: 6th International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  22. Wu, B., Yang, X., Pan, S., Yuan, X.: Adapting membership inference attacks to GNN for graph classification: approaches and implications. In: IEEE International Conference on Data Mining (ICDM) (2021)

    Google Scholar 

  23. Wu, F., Long, Y., Zhang, C., Li, B.: Linkteller: recovering private edges from graph neural networks via influence analysis. In: 2022 IEEE Symposium on Security and Privacy (SP) (2022)

    Google Scholar 

  24. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: 7th International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  25. Zeng, H., Zhou, H., Srivastava, A., Kannan, R., Prasanna, V.K.: Graphsaint: graph sampling based inductive learning method. In: 8th International Conference on Learning Representations (ICLR) (2020)

    Google Scholar 

  26. Zhang, M., Chen, Y.: Link prediction based on graph neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  27. Zhu, K., Fioretto, F., Van Hentenryck, P.: Post-processing of differentially private data: a fairness perspective. In: 31st International Joint Conference on Artificial Intelligence, IJCAI (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oualid Zari .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Differential Privacy

The original definition of Differential Privacy (DP) [5, 7] was introduced in the context of microdata, i.e., databases containing individual records. A central aspect of DP is the concept of neighborhood, originally defined for that data structure.

Definition 3

(Neighboring Databases). Let \(\mathcal {D}\) be the class of possible databases. Any two databases \(D, D' \in \mathcal {D}\) that differ in one record are called neighbors. For two neighboring databases, \(d(D, D') = 1\), where d denotes the Hamming distance.

Definition 4

(\((\varepsilon , \delta )\)-Differential Privacy [5, 7]). A randomized mechanism \(\mathcal {M}\) satisfies \((\varepsilon ,\delta )\)-DP with \(\varepsilon , \delta \ge 0\) if, for all pairs of neighboring databases \(D, D' \in \mathcal {D}\) and for all measurable \(\mathcal {O} \subseteq \text {Range}(\mathcal {M})\),

$$ \mathbb {P}\{\mathcal {M}(D) \in \mathcal {O}\} \le e^{\varepsilon } \mathbb {P}\{\mathcal {M}(D') \in \mathcal {O}\} + \delta . $$

In simple terms, the output of a DP mechanism should not reveal the presence or absence of any specific record in the database, up to an exponential factor \(\varepsilon \) and additional \(\delta \). When each record corresponds to an individual, DP ensures their information remains confidential. A lower \(\varepsilon \), known as the privacy budget, provides stronger protection.

The most popular DP mechanism is the Laplace mechanism, which relies on global sensitivity, defined as follows:

Definition 5

(Global Sensitivity [7]). The \(L_p\)-global sensitivity of a query function \(f: \mathcal {D} \rightarrow \mathbb {R}^d\) is defined as

$$ \Delta _p(f) = \max _{\begin{array}{c} D, D' \in \mathcal {D} \end{array}} \Vert f(D) - f(D') \Vert _p, $$

where \(D, D'\) are any two neighboring databases.

Definition 6

(Laplace Mechanism [7]). Given any function \(f: \mathcal {D} \rightarrow \mathbb {R}^d\), the Laplace mechanism is defined as:

$$ \mathcal {M}_L(D, f(\cdot ), \varepsilon ) = f(D) + (Y_1, \ldots , Y_d), $$

where \(Y_i\) are i.i.d. random variables drawn from a Laplace distribution with zero mean and scale \(\Delta _1(f)/\varepsilon \).

1.2 A.2 Experimental Setup

Datasets. We evaluated our attack on several real-world datasets used in related research. We include the Flickr dataset [25], where nodes represent images and edges connect nodes with shared properties. Node features contain word representations. We also use two Twitch datasets (TWITCH-FR and TWITCH-RU) [16] to evaluate NILS and Twitch-ES for training GNNs in an inductive setting, as done in [23]. Twitch datasets map follow connections between users and aim to classify if a streamer uses explicit language, using features like preferred games and location. For the transductive setting, where training and testing occur on the same graph, we use three citation network datasets: Cora, Citeseer, and Pubmed [19]. These involve predicting the topic of publications based on textual features and citation relationships.

Models. We follow the approach in [23] for training models and selecting hyperparameters. The authors trained Graph Convolutional Networks (GCNs) using various configurations, including normalization techniques, the number of hidden layers, input and output units, and dropout rates. A grid search strategy identified optimal hyperparameters, evaluating performance on a validation set. By using the same training procedures and hyperparameter tuning strategies, we ensure consistency across studies.

Evaluation Metrics. We employ precision, recall, and the \(F_1\) score as our primary evaluation metrics, following [23]. These metrics are suitable for addressing the imbalanced binary classification problem, where the minority class (connected nodes) is of central interest. We select target nodes \(V_{\mathcal {A}}\) such that \(|V_{\mathcal {A}}|=500\), using uniform random sampling. We explore scenarios where target nodes exhibit either low or high degrees, as discussed in [23, Section V.D.]. Results are averaged over three runs with different random seeds, along with the corresponding standard deviation.

1.3 A.3 Limitations: Depth of the GNN

We examine the impact of increasing the GNN depth on the success rate of the attack for the Twitch-FR dataset. Our findings in Fig. 4 show that as GNN depth increases, the attack’s success rate decreases. This reduction is due to the dilution of the injected node’s influence within the target node’s neighborhood. As GNN depth increases, the model aggregates information from a larger neighborhood, diluting the influence of the injected malicious node and diminishing the attack’s effectiveness.

Compared to LinkTeller [23], as shown in Table 4, NILS outperforms LinkTeller across various GCN depths. For the Twitch-FR dataset, NILS demonstrates higher precision and recall values at GCN depth 3 (precision: \(85.1 \pm \scriptstyle 1.2\), recall: \(81.6 \pm \scriptstyle 1.2\)) compared to LinkTeller at depth 2 (precision: \(84.1 \pm \scriptstyle 3.7\), recall: \(78.2 \pm \scriptstyle 1.9\)). These results highlight the effectiveness of our node injection strategy, consistently outperforming LinkTeller across different GCN depths.

Fig. 4.
figure 4

Success rates of the attack for different depths and malicious features generation strategies for Twitch-FR dataset

Table 4. Success rates of the attack for different depths in comparison with LinkTeller [23]. We use the all-ones strategy and Twitch-FR dataset.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zari, O., Parra-Arnau, J., Ünsal, A., Önen, M. (2024). Node Injection Link Stealing Attack. In: Domingo-Ferrer, J., Önen, M. (eds) Privacy in Statistical Databases. PSD 2024. Lecture Notes in Computer Science, vol 14915. Springer, Cham. https://doi.org/10.1007/978-3-031-69651-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69651-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69650-3

  • Online ISBN: 978-3-031-69651-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics