Description:
A three-layer hierarchy is typically used in modern telecommunication systems in order to achieve high performance and reliability. The three layers, namely core, distribution, and access, perform different roles for service fulfillment. The core layer is also referred to as the network backbone, and it is responsible for the transfer of a large amount of traffic in a reliable and timely manner. The network devices (such as routers) in the core layer are vulnerable to hard-to-detect/hard-to-recover errors. For example, the cards that constitute core router systems and the components that constitute a card can encounter hardware failures. Moreover, connectors between cards and interconnects between different components inside a card are also subject to hard faults. Also, since the performance requirement of network devices in the core layer is approaching Tbps levels, failures caused by subtle interactions between parallel threads or applications have become more frequent. All these different types of faults can cause a core router to become incapacitated, necessitating the design and implementation of fault-tolerant mechanisms in the core layer. Proactive fault tolerance is a promising solution because it takes preventive action before a failure occurs. The state of the system is monitored in a real-time manner. When anomalies are detected, proactive repair actions such as job migration are executed to avoid errors, thereby maintaining the non-stop utilization of the entire system. The effectiveness of proactive fault-tolerance solutions depends on whether abnormal behaviors of core routers can be accurately pinpointed in a timely manner. This dissertation first presents an anomaly detector for core router systems using correlation-based time series analysis. The proposed technique monitors a set of features obtained from a system deployed in the field. Various types of correlations among extracted features are identified. A set of features with minimum redundancy and maximum relevance are then grouped into ...
Contributors:
Chakrabarty, Krishnendu
Year of Publication:
2018
Document Type:
Dissertation ; [Doctoral and postdoctoral thesis]
Subjects:
Computer engineering ; Anomaly detection ; Changepoint detection ; Core router systems ; Health-status analysis ; Machine-learning techniques ; Time-series analysis
Content Provider:
Duke University Libraries: DukeSpace  Flag of United States of America