Traffic Dynamics-Aware Probe Selection for Fault Detection in Networks | Journal of Network and Systems Management Skip to main content
Log in

Traffic Dynamics-Aware Probe Selection for Fault Detection in Networks

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

A Correction to this article was published on 26 March 2020

This article has been updated

Abstract

Fault detection in modern networks is done with a set of specially instrumented nodes which send probes to find faults. These probes generate additional traffic in network and compete with other regular traffic for bandwidth. In this paper we consider the problem of dynamically adapting the probes based on traffic dynamics experienced by nodes. We propose to profile the links and nodes to get aggregate I/O statistics in a time window and use it as an instantaneous measure of congestion. We consider the network with I/O statistics to generate a weighted graph and formulate an optimization problem to find a set of probes covering whole network with minimum weight. By showing that finding minimum weight probes maps to a known NP complete problem, we propose three greedy algorithms for selecting probes. With both simulation and real graphs of Internet Service Provider (ISP) networks, we perform five sets of experiments and show that proposed algorithms can dynamically adapt to changes in traffic dynamics and also can select probes in large networks in reasonable time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Change history

  • 26 March 2020

    The original version of the article unfortunately contained a typo error in the first author name in the author group.

Notes

  1. A preliminary version of this paper appeared in COMSNETS 2018 [2].

  2. Congestion can be measured with other parameters also.

  3. This inherently limit usage of our probe selection algorithms within an autonomous system.

  4. Assuming the graph is connected.

  5. We further assume that the situation where K needs to be \(n-1\) does not occur in practical networks with atleast two probe-stations.

  6. Although Dijkstra’s algorithm is used with link state routing algorithms, in practice some policy based decisions may also be involved. Our model can be tweaked to work with other routing algorithms to identify Candidate Probe Set.

  7. This is a random choice among the two equal weight probes \(CP_6= \{(5,1,4,7),399\}\) and \(CP_{12}= \{(7,4,1,5),399\}\)

  8. The threshold for our experiments is set to 50% which is a heuristic selection, however a system administrator can change this to any value.

  9. We can use other metrics like number of bytes transmitted, etc.

  10. Dikjstra’s algorithm is used here to reduce the number of candidate probes available for selection and to be consistent with other two approaches.

References

  1. Dusia, A., Sethi, A.S.: Probe generation for active probing. Int. J. Netw. Manag. 20, 1–21 (2018)

    Google Scholar 

  2. Tayal, A., Hubballi, N., Natu, M., Sadaphal, V.: Congestion-aware probe selection for fault detection in networks. In: COMSNETS’18: Proceedings of the 10th international conference on communication systems and networks, pp. 407–409 (2018)

  3. Guan, X., Qin, T., Li, W., Wang, P.: Dynamic feature analysis and measurement for large-scale network traffic monitoring. IEEE Trans. Inform. Forens. Secur. 5, 905–919 (2010)

    Article  Google Scholar 

  4. Li, K.Y., Chen, C.W., Lee, S.S.W.: Dynamic load balanced routing in IP networks. In: ICM’16: Proceedings of the 6th international conference on information communication and management, pp. 216–221 (2016)

  5. Zhang, J., Yu, F.R., Wang, S., Huang, T., Liu, Z., Liu, Y.: Load balancing in data center networks: a survey. IEEE Commun. Surv. Tutor. 20, 2324–2352 (2018)

    Article  Google Scholar 

  6. Steinder, M., Sethi, A.S.: A survey of fault localization techniques in computer networks. Sci. Comput. Program. 53, 165–194 (2004)

    Article  MathSciNet  Google Scholar 

  7. Natu, M., Sethi, A.S.: Active probing approach for fault localization in computer networks. In: E2EMON ’06: Proceedings of 4th IEEE/IFIP workshop on end-to-end monitoring techniques and services, pp. 1–9 (2006)

  8. Kavulya, S.P., Joshi, K., Giandomenico, F.D., Narasimhan, P.: Failure diagnosis of complex systems. Springer, New York (2012)

    Book  Google Scholar 

  9. Coates, A., Hero III, A.O., Nowak, R., Yu, B.: Internet Tomography. IEEE Signal Processing Magazine 19, 47–65 (2002)

    Article  Google Scholar 

  10. Lawrence, E., Michailidis, G., Nair, V.N., Xi, B.: Network tomography: a review and recent developments. In: Frontiers in statistics, pp. 345–364 (2006)

  11. Remote Network Monitoring MIB Protocol Identifier Macros. https://tools.ietf.org/html/rfc2896

  12. Cisco IOS NetFlow. https://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow

  13. Tivoli Business Systems Manager. http://www.tivoli.com

  14. Groenendijk, J., Huang, Y., Fallon, L.: Adaptive terminal reporting for Scalable Service Quality Monitoring in large networks. In: CNSM’11: Proceedings of the 7th international conference on network and service management, pp. 1–5 (2011)

  15. Ahmed, T., Oreshkin, B., Coates, M.: Machine learning approaches to network anomaly detection. In: USENIX ’07: Proceedings of the 2nd USENIX workshop on tackling computer systems problems with machine learning techniques, pp. 1–6 (2007)

  16. Hu, Z., Zhu, L., Ardi, C., Katz-Bassett, E., Madhyastha, H.V., Heidemann, J., Yu, M.: The need for end-to-end evaluation of cloud availability. In: Passive and active measurement, pp. 119–130 (2014)

  17. Jaggard, A.D., Kopparty, S., Ramachandran, V., Wright, R.N.: The design space of probing algorithms for network-performance measurement. In: SIGMETRICS ’13: Proceedings of the 40th international conference on measurement and modeling of computer systems, pp. 105–116 (2013)

  18. Quan, L., Heidemann, J., Pradkin, Y.: Detecting internet outages with precise active probing (extended). Tech. rep. USC/Information Sciences Institute, Sacramento (2012)

    Google Scholar 

  19. Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding internet reliability through adaptive probing. In: SIGCOMM ’13: Proceedings of the ACM SIGCOMM 2013 conference, pp. 255–266 (2013)

  20. Jeswani, D., Korde, N., Patil, D., Natu, M., Augustine, J.: Probe station selection algorithms for fault management in computer networks. In: COMSNETS ’10: Proceedings of 2nd international conference on communication systems and networks, pp. 1–9 (2010)

  21. Jeswani, D., Natu, M., Ghosh, R.K.: Adaptive monitoring: application of probing to adapt passive monitoring. J. Netw. Syst. Manag. 23, 950–977 (2015)

    Article  Google Scholar 

  22. Jeswani, D., Natu, M., Ghosh, R.K.: Adaptive monitoring: a framework to adapt passive monitoring using probing. In: CNSM ’12: Proceedings of the 8th international conference on network and service management, pp. 350–356 (2012)

  23. Brodie, M., Rish, I., Ma, S., Odintsova, N.: Active probing strategies for problem diagnosis in distributed systems. In: IJCAI’03: Proceedings of the 18th international joint conference on artificial intelligence, pp. 1337–1338 (2003)

  24. Guan, L., Wang, Y., Li, W., Yan, C.: Efficient probing method for active diagnosis in large scale network. In: CNSM ’13: Proceedings of the 9th international conference on network and service management, pp. 198–202 (2013)

  25. Cheng, L., Qiu, X., Meng, L., Qiao, Y., Boutaba, R.: Efficient active probing for fault diagnosis in large scale and noisy networks. In: INFOCOM’ 10: Proceedings of the 30th IEEE international conference on computer communications, pp. 1–9 (2010)

  26. Lu, L., Xu, Z., Wang, W., Sun, Y.: A new fault detection method for computer networks. Reliab. Eng. Syst. Saf. 114, 45–51 (2013)

    Article  Google Scholar 

  27. Ma, L., He, T., Leung, K.K., Towsley, D., Swami, A.: Efficient identification of additive link metrics via network tomography. In: ICDCS ’13: Proceedings of the 33rd international conference on distributed computing systems, pp. 581–590 (2013)

  28. Li, H., Gao, Y., Dong, W., Chen, C.: Taming both predictable and unpredictable link failures for network tomography. IEEE/ACM transactions on networking pp. 1–14 (2018)

  29. Duffield, N.: Network tomography of binary network performance characteristics. IEEE Trans. Inform. Theory 52, 5373–5388 (2006)

    Article  MathSciNet  Google Scholar 

  30. Sommers, J., Barford, P., Duffield, N., Ron, A.: Accurate and efficient SLA compliance monitoring. ACM SIGCOMM: Comput Commun Rev 37, 109–120 (2007)

    Article  Google Scholar 

  31. Mardani, M., Giannakis, G.B.: Estimating traffic and anomaly maps via network tomography. IEEE/ACM Trans. Netw 24, 1533–1547 (2016)

    Article  Google Scholar 

  32. Dong, W., Gao, Y., Wu, W., Bu, J., Chen, C., Li, X.Y.: Optimal monitor assignment for preferential link tomography in communication networks. IEEE/ACM Trans Netw 25, 210–223 (2017)

    Article  Google Scholar 

  33. Wang, P., Xu, H., Niu, Z., Han, D., Xiong, Y.: Expeditus: Congestion-aware load balancing in clos data center networks. In: SoCC ’16: Proceedings of the seventh ACM symposium on cloud computing, pp. 442–455. ACM (2016)

  34. Battre, D., Hovestadt, M., Lohrmann, B., Stanik, A., Warneke, D.: Detecting bottlenecks in parallel DAG-based data flow programs. In: 3rd Workshop on many-task computing on grids and supercomputers, pp. 1–10 (2010)

  35. Barroso, L.A., Clidaras, J., Hoelzle, U.: The datacenter as a computer:An introduction to the design of warehouse-scale machines, pp. 1–154. Morgan & Claypool (2013)

  36. Hammadi, A., Mhamdi, L.: A survey on architectures and energy efficiency in data center networks. Comput. Commun. 40, 1–21 (2014)

    Article  Google Scholar 

  37. Gill, P., Jain, N., Nagappan, N.: Understanding network failures in data centers: measurement, analysis, and implications. ACM SIGCOMM: Comput. Commun. Rev. 41, 350–361 (2011)

    Article  Google Scholar 

  38. Liu, Y., Lin, D., Muppala, J., Hamdi, M.: A study of fault-tolerance characteristics of data center networks. In: DSN 12: Proceedings of the 42nd annual IEEE/IFIP international conference on dependable systems and networks, pp. 1–6 (2012)

  39. Liu, Y., Muppala, J.: Fault-tolerance characteristics of data center network topologies using fault regions. In: DSN 13: Proceedings of the 43rd annual IEEE/IFIP international conference on dependable systems and networks, pp. 1–6 (2013)

  40. NS-3. https://www.nsnam.org/

  41. Rocketfuel: An ISP topology mapping engine. http://research.cs.washington.edu/networking/rocketfuel/

  42. Mahajan, R., Spring, N., Wetherall, D., Anderson, T.: Inferring link weights using end-to-end measurements. In: IMW ’02: Proceedings of the 2nd ACM SIGCOMM workshop on internet measurement, pp. 231–236 (2002)

  43. Cygan, M., Kowalik, A., Wykurz, M.: Exponential-time approximation of weighted set cover. Inform. Process. Lett. 109, 957–961 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

First author would like to thank Vaishali Sadaphal for the discussions she had when she was interning at TRDDC Pune. She would also like to thank for the financial assistance given to her by Department of IT under Visvesvaraya Ph.D. Scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neminath Hubballi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: The first author name was incorrectly published as ‘Anjua Tayal’ and the corrected name is ‘Anuja Tayal’.

Probe Selection is a NP–Complete Problem

Probe Selection is a NP–Complete Problem

In this Appendix, we show that probe-selection optimization problem of selecting minimum weight probes from the set of candidate probes is a known NP-Complete problem as it can be mapped to weighted set cover problem. Weighted set cover problem is defined as follows [43]

Definition 2

Given a finite universe \(U = \{e_{1,} e_{2}, ..., e_{n}\}\) of n members, \(S = \{s_{1}, s_{2}, ..., s_{m}\} \bigwedge , \forall i \; s_{i} \subseteq U\) a collection of subsets of U and a weight function \(w : s \rightarrow {\mathbb {R}}^{+}\) that assigns a positive real weight to each subset of U, the goal is to find the minimum weight subcollection of S whose union is U or a minimum weight set cover.

We define the problem of selecting probes from given set of candidate probes as follows

Definition 3

Given set of nodes to be covered in a graph \(V = \{v_{1}, v_{2}, ..., v_{n}\}\) of n nodes and Candidate Probe Set \(CP = \{CP_{1}, CP_{2}, ..., CP_{cp}\} \bigwedge , \forall i \; CP_{i} \subseteq V\) a collection of subsets of V and a weight function \(W : CP \rightarrow {\mathbb {R}}^{+}\) that assigns a positive real weight to each subset of V, the goal is to find the minimum weight subcollection of CP whose union is V or a minimum weight probes.

The set of finite universe U in weighted set cover problem is analogous to the set of nodes V to be covered in a graph G, while set S is equivalent to set of candidate probes. Each candidate probe \(CP_{i}\) consists of set of nodes \(\subseteq V\) covered by that probe. There exists a weight function W which assigns weight to each candidate probe which is the cost of selecting that probe. Our goal is to find minimum weight probes \(PS \subseteq CP\) which covers all the nodes of the graph or the union is V.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tayal, A., Sharma, N., Hubballi, N. et al. Traffic Dynamics-Aware Probe Selection for Fault Detection in Networks. J Netw Syst Manage 28, 1055–1084 (2020). https://doi.org/10.1007/s10922-020-09514-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-020-09514-3

Keywords

Navigation