Abstract
Transferring a structure from the visual modality to the aural one presents a difficult challenge. In this work we are experimenting with prosody modeling for the synthesized speech representation of tabulated structures. This is achieved by analyzing naturally spoken descriptions of data tables and a following feedback by blind and sighted users. The derived prosodic phrase accent and pause break placement and values are examined in terms of successfully conveying semantically important visual information through prosody control in Table-to-Speech synthesis. Finally, the quality of the information provision of synthesized tables when utilizing the proposed prosody specification is studied against plain synthesis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pontelli, E., Xiong, W., Gupta, G., Karshmer, A.: A Domain Specific Language Framework for Non-visual Browsing of Complex HTML Structures. In: Proc. ACM Conf. Assistive Technologies - ASSETS 2000, pp. 180–187 (2000)
Ramel, J.-Y., Crucianou, M., Vincent, N., Faure, C.: Detection, Extraction and Representation of Tables. In: Proc. 7th Int. Conf. Document Analysis and Recognition - ICDAR 2003, pp. 374–378 (2003)
Hurst, M., Douglas, S.: Layout & Language: Preliminary Experiments in Assigning Logical Structure to Table Cells. In: Proc. 4th Int. Conf. Document Analysis and Recognition - ICDAR 2003, pp. 1043–1047 (2003)
Filepp, R., Challenger, J., Rosu, D.: Improving the Accessibility of Aurally Rendered HTML Tables. In: Proc. ACM Conf. on Assistive Technologies - ASSETS 2002, pp. 9–16 (2002)
Lim, S., Ng, Y.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: Proc. 8th ACM Int. Conf. Information and Knowledge Management - CIKM 1999, pp. 466–474 (1999)
Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering Tables in Audio: The Interaction of Structure and Reading Styles. In: Proc. ACM Conf. Assistive Technologies - ASSETS 2004, pp. 16–23 (2004)
Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML Tables, Frames, and XML Fragments. In: Proc. ACM Conf. on Assistive Technologies - ASSETS 2002, pp. 25–32 (2002)
Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. Human-Computer Interaction - HCII (2005)
Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling Emphatic Events from Non- Speech Aware Documents in Speech Based User Interfaces. In: Proc. Human-Computer Interaction - HCII 2003, Theory and Practice, 2, pp. 806–810 (2003)
Raman, T.: An Audio View of (LA)TEX Documents, TUGboat. In: Proc. 1992 Annual Meeting, vol. 13(3), pp. 372–379 (1992)
Xydas, G., Kouroupetrolgou, G.: Text-to-Speech Scripting Interface for Appropriate Vocalisation of E-Texts. In: Proc. 7th European Conf. Speech Communication and Technology - EUROSPEECH 2001, pp. 2247–2250 (2001)
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V.: Experimentation on Spoken Format of Tables in Auditory User Interfaces. In: Universal Access in HCI, Proc. HCI International 2005: The 11th International Conference on Human-Computer Interaction (HCII-2005), Las Vegas, USA, 22-27 July, pp. 22–27 (2005) (to appear)
Raggett, D., Le Hors, A., Jacobs, I.: Tables, HTML 4.01 Specification. W3C Recommendation (1999), http://www.w3.org/TR/REC-html40
Chisholm, W., Vanderheiden, G., Jacobs, I.: Web Content Accessibility Guidelines 1.0. W3C Recommendation, May 5 (1999), http://www.w3.org/TR/WAI-WEBCONTENT/
Penn, G., Hu, J., Luo, H., McDonald, R.: Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices. In: Proc. 6th Int. Conf. on Document Analysis and Recognition - ICDAR 2001, pp. 1074–1078 (2001)
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: A Standard for Labeling English Prosody. In: Proc. Int. Conf. Spoken Language Processing - ICSLP 1992, vol. 2, pp. 867–870 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G. (2005). Diction Based Prosody Modeling in Table-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_38
Download citation
DOI: https://doi.org/10.1007/11551874_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)