Abstract
A spatial anomaly captures a phenomenon occurring in a region which is vastly deviant in behavior with respect to the other normal observations. However, in reality this anomaly may impact other phenomena in the region across multiple domains, for example, crime is often linked to other sociopolitical factors or phenomenon such as poverty and education. Similarly, accidents in the region may be linked to other environmental factors such as weather and surface condition. So, finding anomalies across multiple domains is important in various applications. In this paper, we propose an approach for finding such a tangible anomalous window across multiple domains where window refers to the set of contiguous points in space, and since the window is multi-domain, there are several overlapping windows in the same space across domains. Our approach for finding anomalous window across the domains comprises the following steps: (1) single-domain anomaly detection: discovering anomalous window in each domain; (2) association rule mining: discovering relationship between the anomalous windows across domains using association rule mining; and (3) validation: validating the result using (a) Monte Carlo simulation, (b) correlation using lift and (c) ground truth evaluation. In addition, we also provide a probabilistic framework to evaluate the relationships between the spatial nodes as a postprocessing step. Finally, we provide a visualization technique for viewing the multi-domain anomalous window and the probabilistic relationships between the nodes. We provide detailed experimental results and comparisons with other approaches using real-world health ranking [51] and transportation datasets [50] with known ground truth windows. The results show that our approach is effective in finding the anomalies in multiple domains as compared to other approaches.































Similar content being viewed by others
References
Agarwal D, McGregor A, Phillips JM, Venkatasubramanian S, Zhu Z (2006) Spatial scan statistics: approximations and performance study. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (Philadelphia, PA, USA, August 20–23, 2006), KDD ’06. ACM, New York, NY, pp 24–33. doi:10.1145/1150402.1150410
Agrawal R, Imielminski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD conference, pp 207–216
Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York
Bonnie DR, Sorensen J, Guest Column (2011) Where you live matters to your health. http://www.news-journalonline.com/opinion/editorials/guest-columns/2010/07/12/where-you-live-matters-to-your-health.html. Last accessed March 2011
Breiger RL (1974) The duality of persons and groups. University of North Carolina Press, Social Forces, Chapel Hill
Chawla S, Sun P (2006) SLOM: a new measure for local spatial outliers. Knowl Inf Syst 9(4):412–429
Computer science-advanced web and network technologies, and applications. Lecture Notes in Computer Science, 2008, vol 4977/2008, pp 99–109. doi:10.1007/978-3-540-89376-9
Das K, Schneider J, Neill DB (2008) Anomaly pattern detection in categorical datasets. In: Proceedings of 14\(^{\rm th}\) ACM SIGKDD 2008. ACM, New York, pp 169–176
de Vries T, Chawla S, Houle ME (2011) Density-preserving projections for large-scale local anomaly detection. Knowl Inf Syst. doi:10.1007/s10115-011-0430-4. Last accessed 9 Dec
Dillard B, Shmueli G (2004) Simultaneous analysis of multiple time series using two-dimensional wavelets. Manuscript 1:1
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 44–49, USA. AAAI Press, Menlo Park
Everett Martin G, Borgatti Stephen P (1998) Analyzing clique overlap. Connections 21(1):49–61
Han D, Rogerson PA, Nie J, Bonner MR, Vena JE, Vito D, Muti P, Trevisan M, Edge SB, Freudenheim JL (2004) Geographic clustering of residence in early life and subsequent risk of breast cancer (United States). Cancer Causes Control 15(9):921–929
Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of the seventh international conference on knowledge discovery and data mining, pp 281–286, ACM Press, New York
Health Statistics, Obesity (most recent) by country. http://www.nationmaster.com/graph/hea_obe-health-obesity. Last accessed March 2011
Hido S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26(2):309–336
Howe HL, Wingo PA, Thun MJ, Ries LA, Rosenberg HM, Feigal EG, Edwards BK (2001) Annual report to the nation on the status of cancer (1973 through 1998), featuring cancers with recent increasing trends. J Natl Cancer Inst 93(11):824–842
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732272/. Cancer outlier detection based on likelihood ratio test
Hu J Cancer outlier detection based on likelihood ratio test. http://bioinformatics.oxfordjournals.org/content/24/19/2193.short
Janeja VP, Adam N, Atluri V, Vaidya JS (March 2010) Spatial neighborhood based anomaly detection in sensor datasets. In: Special issue on outlier detection data mining and knowledge discovery, vol 20(2). Springer, Berlin, pp 221–258
Janeja VP, Adam NR, Atluri V, Vaidya J (2010) Spatial neighborhood based anomaly detection in sensor datasets. Data Min Knowl Discov 2:221–258. doi:10.1007/s10618-009-0147-0
Janeja VP, Atluri V (2008) Random walks to identify anomalous free-form spatial scan windows. In: IEEE TKDE 20(10):1378–1392
Janeja VP, Atluri V, Vaidya JS, Adam N (2005) Collusion set detection through outlier discovery. In: IEEE intelligence and security informatics (IEEE ISI). Atlanta, Georgia
Janet G (ed) (2008) State of the evidence the connection between breast cancer and the environment, 5th edn. Ph.D., published by the Breast Cancer Fund
JGraph (2011) Java graph component for the visualization and layout of graphs. http://www.jgraph.com/. Last accessed 9 Dec 2011
Jiawei H, Micheline K (2006) Data mining: concepts and techniques, 2\(^{\rm nd}\) edn. Morgan Kauffman
Jung I, Kulldorff M, Klassen A (2007) A spatial scan statistic for ordinal data. Stat Med 26:1594–1607
Knorr Edwin M, Ng Raymond T (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26:1481–1496
Kulldorff M (1999) Spatial scan statistics: models, calculations and applications. In: Glaz J, Balkrishnan N (eds) Scan statistics and applications, statistics for industry and technology, pp 303–322
Kulldorff M, Athas W, Feuer E, Miller B, Key C (1998) Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos. Am J Public Health 88(9):1377–1380
Kulldorff M, Nagarwalla N (1995) Spatial disease clusters: detection and inference. Stat Med 14:799–810
Lu C, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attribute. In: Proceedings of ICTAI’03, Proceedings of 15\(^{\rm th}\) IEEE international conference on tools with artificial intelligence, p 122
Multivariate scan statistics for disease surveillance. http://www.dbmi.pitt.edu/panda/papers/Kulldorff/k-M2005.pdf
Naus J (1965) The distribution of the size of the maximum cluster of points on the line. J Am Stat Assoc 60:532–538
Neill DB, Cooper GF, Das K, Jiang X, Schneider J (2009) Bayesian network scan statistics for multivariate pattern detection. In: Scan statistics: statistics for industry and technology, pp 221–249
Neill DB, Moore AW, Cooper GF A multivariate Bayesian scan statistic
New Jersey accident data for state routes. http://www.state.nj.us/transportation/refdata/accident/ (1999)
Newman MEJ (2008) The mathematics of networks, The New Palgrave encyclopedia of economics
NodeXL (2011) An excel 2007/2010 template for viewing network graphs. http://nodexl.codeplex.com/. Last accessed 9 Dec 2011
Park Y, Priebe CE, Marchette DJ, Youssef A (2009) Anomaly detection using scan statistics on time series hypergraphs, workshop on link analysis, SDM 2009
Patcha A, Park J-M (2007) An overview of anomaly detection techniques: existing Solutions and latest technological trends. Comput Netw 51(12):3448–3470
Rivers RW (2006) Evidence in traffic crash investigation and reconstruction: identification, interpretation and analysis of evidence, and the traffic crash investigation and reconstruction process
Sabyasachi B, Martin M (2007) Automatic outlier detection for time series: an application to sensor data. Knowl Inf Syst 11(2):137–154
Sergey Brin, Lawrence Page (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 33:107–117
Shi L, Janeja VP (2009) Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP). In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (Paris, France, June 28–July 01, 2009), KDD ’09. ACM, New York, NY, pp 767–776. doi:10.1145/1557019.1557104
Shmueli G, Fienberg SE (2004) Current and potential statistical methods for monitoring multiple data streams for bio-surveillance. In: Wilson A, Olwell D (eds) Statistical methods in counter-terrorism
Snyder D (2001) Online intrusion detection using sequences of system calls. M.S. thesis, Department of Computer Science, Florida State University
Sslip:code, datasets and known window reports. http://userpages.umbc.edu/~leishi1/sslip/sslip.htm (2009)
State of New Jersey, Department of Transportation, Crash records, http://www.state.nj.us/transportation/refdata/accident/. Last accessed March 2011
The County Health Rankings, a key component of the mobilizing action toward community health (MATCH) project. http://www.countyhealthrankings.org/. Last accessed March 2011
Wasserman S, Faust K (1994) Social network analysis. Cambridge University Press, Cambridge
WEKA Weka 3: data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/. Last accessed March (2011)
World Road Association, PIARC Road accident investigation guidelines for road engineers. http://www.who.int/roadsafety/news/piarc_manual.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Janeja, V.P., Palanisamy, R. Multi-domain anomaly detection in spatial datasets. Knowl Inf Syst 36, 749–788 (2013). https://doi.org/10.1007/s10115-012-0534-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0534-5