Abstract
The effort around EXTIRP 2004 focused on the heterogeneity of XML document collections. The subcollections of the heterogeneous track (het-track) did not offer us a suitable testbed, but we successfully applied methods independent of any document type to the original INEX test collection. By closing our eyes to the element names defined in the DTD, we created comparable runs and discovered improvement in the results. This was anticipated evidence for our hypothesis that we do not need to know the element names when indexing the collection or when returning full-text answers to the Content-Only type queries. Some problematic areas were also identified. One of them is score combination which enables us to combine elements of any size into one ranked list of results given that we have the relevance scores of the leaf-level elements. However, finding a suitable score combination method remains part of our future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fuhr, N., Goevert, N., Kazai, G., Lalmas, M. (eds.): INEX: Evaluation Initiative for XML retrieval - INEX 2002 Workshop Proceedings, Schloss Dagstuhl. DELOS Workshop (2003)
Fuhr, N., Lalmas, M.: Report on the INEX 2003 Workshop. In: SIGIR FORUM, Schloss Dagstuhl, December 15-17, 2003, vol. 38, pp. 42–47 (2004)
Ahonen-Myka, H.: Finding All Frequent Maximal Sequences in Text. In: Mladenic, D., Grobelnik, M. (eds.) Proceedings of the 16th International Conference on Machine Learning ICML 1999 Workshop on Machine Learning in Text Data Analysis, Ljubljana, Slovenia. J. Stefan Institute, pp. 11–17 (1999)
Doucet, A., Aunimo, L., Lehtonen, M., Petit, R.: Accurate Retrieval of XML Document Fragments using EXTIRP. In: INEX 2003 Workshop Proceedings, Schloss Dagstuhl, Germany, pp. 73–80 (2003)
Ramaswamy, L., Iyengar, A., Liu, L., Douglis, F.: Automatic detection of fragments in dynamically generated web pages. In: 13th World Wide Web Conference (WWW 2004), pp. 443–454 (2004)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Chakrabarti, S.: Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In: Proceedings of the tenth international conference on World Wide Web, pp. 211–220. ACM Press, New York (2001)
Hoi, K.K., Lee, D.L., Xu, J.: Document visualization on small displays. In: Chen, M.-S., Chrysanthis, P.K., Sloman, M., Zaslavsky, A. (eds.) MDM 2003. LNCS, vol. 2574, pp. 262–278. Springer, Heidelberg (2003)
Abolhassani, M., Fuhr, N., Malik, S.: HyREX at INEX 2003. In: INEX 2003 Workshop Proceedings, pp. 49–56 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lehtonen, M. (2005). EXTIRP 2004: Towards Heterogeneity. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_30
Download citation
DOI: https://doi.org/10.1007/11424550_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26166-7
Online ISBN: 978-3-540-32053-1
eBook Packages: Computer ScienceComputer Science (R0)