Abstract
Widespread use of GPS devices and ubiquity of remotely sensed geospatial images along with cheap storage devices have resulted in vast amounts of digital data. More recently, with the advent of wireless technology, a large number of sensor networks have been deployed to monitor many human, biological and natural processes. This poses a challenge in many data rich application domains now: how to best choose the datasets to solve specific problems? In particular, some of the datasets may be redundant and their inclusion in analysis may not only be time consuming, but also lead to erroneous conclusions. On the other hand, excluding some of the datasets hastily might skew the observations drawn. We propose the concept of data support as the basis for efficient, cost-effective and intelligent use of geospatial data in order to reduce uncertainty in the analysis and consequently in the results. Data support is defined as the process of determining the information utility of a data source to help decide which one to include or exclude to improve cost-effectiveness in existing data analysis. In this paper we use mutual information—a concept popular in information theory as a measure to compute information gain or loss between two datasets—as the basis of computing data support. The flexibility and effectiveness of the approach are demonstrated using an application in the hydrological analysis domain, specifically, watersheds in the state of Nebraska.













Similar content being viewed by others
References
Agrawal R, Lin K-I, Sawhney SH, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases, Proceedings of the 21st VLDB Conference, pp. 490–501
Bachmann A, Allgöwer B (2002) Uncertainty propagation in wildland fire behavior modelling. Int J Geogr Inf Sci 16(2):115–127
Barrera R, Frank A, Al-Taha K (1990) Technical report, In Temporal relations in geographic information systems: a workshop at the University of Maine, Orono, October 12–13, 1990, Retrieved February 12, 2009, from http://www.ncgia.ucsb.edu/Publications/Tech_Reports/91/91-4.pdf
Bishop CM (2006) Pattern recognition and machine learning. Springer Science+Business Media, LLC
Chan K-P, Fu AW-C (1999) Efficient time series matching by wavelets, IEEE International Conference on Data Engineering. pp. 126–133
Chow VT (1964) Handbook of applied hydrology. McGraw-Hill
Christakos G (2000) Modern spatiotemporal geostatistics. Oxford University Press, Inc., New York
Cover TM, Thomas JA (2005) Elements of information theory, 2nd edn. Wiley, Hoboken
Deng K, Bourke C, Scott S, Sunderman J, Zheng Y (2007) Bandit-based algorithms for budged learning, In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM’2007), Omaha, NE, October 28–31, pp. 463–468
Freire L, Roche A, Mangin J-F (2002) What is the best similarity measure for motion correction in fMRI time series? IEEE Trans Med Imaging 21(5):470–484
Fu L, Samal A, Soh L-K (2008) Techniques for computing fitness of use (FoU) for time series datasets with applications in the geospatial domain. GeoInformatica 12(1):91–115
Hadim S, Mohamed N (2006) Middleware challenges and approaches for wireless sensor networks. IEEE Distrib Syst Online 7(3):1
Haenselmann T (2006) GFDL Wireless sensor network textbook, Retrieved March 30, 2009, from http://www.informatik.uni-mannheim.de/~haensel/sn_book
Helsel DR, Hirsch RM (2002) Statistical methods in water resources: techniques of water resources investigations of the united states geological survey, book 4, chapter A3. Hydrologic Analysis and Interpretation, 510 pp
Hunter G (1998) Managing uncertainty in GIS, NCGIA Core Curriculum in GIScience, http://www.ncgia.ucsb.edu/giscc/units/u187/u187_f.html, posted February 03, 1998
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Education, Inc., New Jersey
Jones F (2004) Volumes of parallelograms. Online book, Chapter 8. Retrieved November 30, 2008, from http://www.owlnet.rice.edu/~fjones/chap8.pdf
Laffan SW (2002) Using process models to improve spatial analysis. Int J Geogr Inf Sci 16(3):245–257
Lawson MP, Reiss A, Phillips R, Livingston K (1971) Nebraska droughts: a study of their past chronological and spatial extent with implications for the future. Occasional Papers No. 1, Department of Geography, University of Nebraska-Lincoln, p. 147
Liao TW (2005) Clustering of time series data – a survey. Pattern Recognition 38:1857–1874
Lins HF, Slack JR (1999) Streamflow trends in the United States. Geophys Res Lett 26(2):227–230
Lizotte DJ, Madani O, Greiner R (2003) Budgeted learning of Naïve-Bayes classifiers, In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’2003), Acapulco, Mexico, August 8–10
Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series, IEEE International Conference on Data Mining (ICDM), pp. 273–280
Madani O, Lizotte DJ, Greiner R (2004) Active Model Selection, In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI’2004), Banff, Canada, July 8–11, pp. 357–365
Melville P, Saar-Tsechansky M, Provost F, Mooney R (2004) Active feature-value acquisition for classifier induction, In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’2004), Brighton, UK, November 1–4, p. 483–486
Milton JS, Arnold JC (2002) Introduction to probability and statistics: principles and applications for engineering and the computing sciences, 4th edn. The McGraw-Hill Companies, Inc., New York
Patel JK, Read CB (1996) Handbook of the normal distribution. CRC Press
Popivanov I, Miller RJ (2002) Similarity search over time-series data using wavelets, IEEE International Conference on Data Engineering. pp. 212–221
Römer K, Mattern F (2004) The design space of wireless sensor networks. IEEE Wireless Communications 11(6):54–61
Serre M (2007) Lecture at University of Nebraska-Lincoln. June, 2007
Tallaksen LM, Madsen H, Clausen B (1997) On the definition and modelling of streamflow drought duration and deficit volume. Hydrol Sci 42:15–33
Tallaksen LM (2000) Streamflow drought frequency analysis. In: Vogt JV, Somma F (eds) Drought and drought mitigation in Europe. Kluwer Academic Publishers, Derdrecht, pp 103–117
Wright EJ (2006) Fitness for use - to support military decision making, In Proceedings of Accuracy 2006; 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Lisbon, Portugal, Instituto Geográfico Português, p. 760–769
Zelenhasic E, Salwai A (1987) A method of streamflow drought analysis. Water Resour Res 23(1):156–168
Acknowledgments
This material is based upon work supported by the National Science Foundation under Grants No. 0219970, 0535255, and an IGERT grant No. DGE-0903469. We would like to thank Joshi Deepti and Dr. David Marx for their suggestions and help.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hong, T., Hart, K., Soh, LK. et al. Using spatial data support for reducing uncertainty in geospatial applications. Geoinformatica 18, 63–92 (2014). https://doi.org/10.1007/s10707-013-0177-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-013-0177-z