Abstract
The information source of voice interaction system response may influence users’ affective experience. This research is to get a better understanding of whether information source should be provided in the response of voice interaction system and what types of information source should be used to offer a better affective experience to users. In this study, we explored the effect of three different information source types (no information source, information source from professional organizations, information source from internet users) of voice interaction system’s responses on users’ affective experience from five different application scenarios (music query, news query, health query, travel query, and restaurant query). Three questions, including affection, acceptance and satisfaction, were used to measure users’ affective experience. All quantitative data were collected based on the E-Prime experimental platform and 21 participants took part in this study. The results showed that, as a whole, participants preferred the responses with information sources from professional organizations, while there was no significant difference between information source from internet users and no information source. In different application scenarios, the types of information sources preferred by users were different. In music query, news query, and health query scenarios, it was recommended to consider information source from professional organizations; In travel query and restaurant query scenarios, it was suggested to consider information source from internet users.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Users’ affective experience is their attitudinal experience and cognitive process towards objective things. Users’ affection and cognition have effects on their behaviors [1, 2]. Norman put forward the concept of emotional design, pointing out that individual emotional state depends on whether objective things meet their needs [3]. Emotional experience provided by products can build more connections between users according to Norman. The satisfaction level can lead to emotion change of users, and also the final and most intuitive manifestation of users’ experience with the product. User’s affective experience is also affected by various factors, such as the usability of the product [4, 5]. Improving the affective experience of users can also improve user’s acceptance and satisfaction of the products [6]. If users’ affective experience is not good, they will prefer to put up with the temporary troubles to adapt to other alternative products rather than give up a better experience [4].
Voice interaction has gradually become a common form of human-computer interaction. With the continuous realization of voice interaction system functions, users pay more attention to its affective experience [7]. People always want to use their way of speaking to communicate with the voice system, expecting it can easily understand their intention and give a satisfactory response. Thus, in the process of interaction with the voice system, users tend to concern about whether the response of the voice system can help them to solve the problem directly. However, users have made a lot of negative comments on the responses of current voice interaction products, such as “Its response was not what I wanted, I doubted the reliability of its reply”, “I just listen to it and don’t take it too seriously. If the question is important, I will look for more professional responses from other sources” or “I use it more as an entertainment tool than an assistant”, etc.
Obviously, current voice interaction system cannot fully meet the needs of users, let alone the affective experience. In many cases, such as health inquiry, users would rather do online search on their own than take the answer provided by voice systems, even though the voice systems and the search engines share the same web resources and provide the same answers. Why? One possible reason is that voice system’s responses are not expressed in a trustworthy way to users. During the process of interpersonal communication, different ways of word expression will generate different affective experience for listeners [8, 9]. How to improve users’ affective experience based on current voice interaction system? A better approach is to change the expression of the voice system’s response to make the idiotic system interact more naturally. It brings a new challenge to the existing natural language processing technology. But what kind of expression means can improve users’ affective experience are unrevealed. Based on the experience of presentation of web information retrieval results, an effective way is to add an information source flag to the search result page to let users know the information source of each result [10,11,12], then users can filter the system’s recommendation based on their personal information source preferences and make a final choice. This method effectively improves the user experience of web search [13]. But at present, the response of voice interaction system does not provide relevant information source.
Based on above, this study explored the influence of three different voice system’s response means with different information source types (i.e. no information source, information source from professional organizations, information source from internet users) on user’s affective experiences in five commonly-used scenarios (i.e. music query, news query, health query, travel query, and restaurant query) respectively. Through the experimental method, quantitative data were collected on the E-Prime platform. And based on the previous researches [14,15,16], a 3-item questionnaire, including affection, acceptance and satisfaction, was designed to measure users’ affective experience with different voice system response. This research will contribute to the emotional experience design of the response of voice interaction system and share some Chinese users’ experience of voice interaction system.
2 Method
2.1 Questionnaire Design
Many scholars have tried to measure users’ affective experience through different methods, such as extracting users’ affective variables from online reviews [17], collecting users’ physiological responses to observe users’ affective experiences [18], and acquiring users’ self-reported emotional data through the measurement questionnaires [19,20,21]. Such as PAD scale, the commonly-used emotional measurement scale, uses three dimensions to distinguish and explain the specific human emotions [21], while the AttrakDiff scale was usually used to measure users’ specific affective responses, not affective experiences [19, 20]. At present, there is no general scale for the measurement of users’ affective experience with voice system response. Therefore, this study used direct questions to ask users how they feel about the voice response.
In this study’s questionnaire, three factors, namely affection, acceptance, and satisfaction, were included. Among them, the dimension of affection was used to evaluate users’ liking of the voice system responses [15, 16], the dimensions of acceptance and satisfaction were used to measure user’s overall satisfaction with the content presentation of a specific voice system response. The three-item questionnaire was asked as “In this scenario, I think the voice system’s response was pleasing. In this scenario, I think the voice system’s response was acceptable. In this scenario, I think the voice system’s response was satisfactory”. Each item was rated on a seven-point scale ranging from 1 (strongly disagree) to 7 (totally agree).
2.2 Design and Synthesis of Voice System’s Responses
According to the purpose of this study, we wrote a preliminary experiment script of the responses of voice interaction system. Through a comprehensive investigation on the brand, sales volume and usability of smart speakers in the Chinese domestic market, Baidu’s products were selected as the basis of experimental script design. After three rounds of review and modification, the final experimental script was completed.
In music query scenario, when a user inputs a query, “Xiaodu Xiaodu, play the best pop vocal album of the year for me”, the voice system answers, “Best pop vocal album for you” (no information source), or “Best pop vocal album from the Grammys for you” (information from professional organizations), or “Best pop vocal albums from internet users’ recommendation for you” (information from internet users).
In news query scenario, when a user inputs a query, “Xiaodu Xiaodu, play the latest social news”, the voice system answers, “Beijing will implement a comprehensive garbage classification policy next year” (no information source), or “According to state media, Beijing will implement a comprehensive garbage classification policy next year” (information from professional organizations), or “According to internet users revealed, Beijing will implement a comprehensive garbage classification policy next year” (information from internet users).
In health query scenario, when a user inputs a query, “Xiaodu Xiaodu, how to remove dampness”, the voice system answers, “Dampness is a theoretical concept of traditional Chinese medicine, and it can be taken with proprietary Chinese medicines, such as xiangsha liujunzi pill” (no information source), or “According to professional medical institutions, dampness is a theoretical concept of traditional Chinese medicine, and it can be taken with proprietary Chinese medicines, such as xiangsha liujunzi pill” (information from professional organizations), or “According to the recommendation by internet users, dampness is a theoretical concept of traditional Chinese medicine, and it can be taken with proprietary Chinese medicines, such as xiangsha liujunzi pill” (information from internet users).
In travel query scenario, when a user inputs a query, “Xiaodu Xiaodu, what famous tourist attractions are there in Yunnan”, the voice system answers, “Famous tourist attractions in Yunnan include: shangri-la, lijiang old town, and stone forest scenic spot” (no information source), or “According to the recommendation by professional scenic spot rating agencies, famous tourist attractions in Yunnan include: shangri-la, lijiang old town, and stone forest scenic spot” (information from professional organizations), or “According to the recommendation by internet users, famous tourist attractions in Yunnan include: shangri-la, lijiang old town, and stone forest scenic spot” (information from internet users).
In restaurant query scenario, when a user inputs a query, “Xiaodu Xiaodu, what good restaurants are there in Beijing”, the voice system answers, “Delicious restaurants in Beijing include: Jubaoyuan, Dadong roast duck restaurant, and Donglaishun restaurant” (no information source), or “According to recommendation by professional food review agencies, delicious restaurants in Beijing include: Jubaoyuan, Dadong roast duck restaurant, and Donglaishun restaurant” (information from professional organizations), or “According to recommendation by internet users, delicious restaurants in Beijing include: Jubaoyuan, Dadong roast duck restaurant, and Donglaishun restaurant” (information from internet users).
Based on the experimental script, audio experimental materials were synthesized in the Text to Speech column of Baidu AI open platform. Standard male voice was used to compose the audio of users’ questions, standard female voice was used to compose the audio of voice interaction system response, and then the user’s question audio and corresponding response audio of the voice interactive system were spliced into a complete conversational interaction audio by an audio editing software. A complete conversation audio was used as an experimental material. 15 experimental materials were synthesized in this study.
2.3 Building of Experimental Procedures
E-Prime software is an experimental platform for researchers in psychology design and analyze their own experiments. E-Prime promises to become the standard for building experiments in psychology, it is possible to construct a Web-based resource that uses E-Prime as the delivery engine for a wide variety of instructional materials [22]. Based on the E-Prime platform, we built a 7-point scoring experimental procedure for data collection. In this procedure, participants were asked to listen to a conversation (experimental audio material) first and then grade each voice system response based on the affective experience questionnaire. Before the start of the experiment procedure, participants would be reminded to ignore their preference of physical parameters such as tone and timbre of the voice in the experiment, because all the experimental audio material’s physical parameters were same. And the participants should pay attention to the content of the voice interaction system’s response.
The experimental procedure consisted of two parts. The first part was the exercises, and the second was the formal experiment. The exercises part could be repeated. Every participant needed to do the exercises first and started the formal experiment after they were familiar with the experiment.
2.4 Design of Personal Information Questionnaire
The Personal Information Questionnaire was designed to collect the participants’ personal information, including their gender, age, educational background, employment status, marital status and the usage of intelligent voice interactive devices.
2.5 Data Collection
This study was approved by the ethics review committee of Beihang University in China. Each participant was required to sign a consent form before the experiment, and was informed of the experiment purpose and the data categories to be collected. Personal-generated data would be used for experimental research, and participants had the right to revoke and the obligation to keep secret. The researchers promised to keep each participant’s private information strictly confidential, and cash reward (RMB 60 yuan) was paid to each participant at the end of the experiment. A total of 21 participants were recruited. All the participants had experiences in using smart speakers or smart voice assistants on mobile phones, and they were interested in voice interaction systems.
In experimental procedure, the five application scenarios were given in a random order, and the three experimental audio materials with different information source types in every application scenario all given in a random order as well. The test data were collected on the E-Prime experimental platform.
3 Results
3.1 Demographic Statistics
All of 21 participants’ experimental data were valid in this study. Of these participants, 10 people were male and 11 people were female. With respect to age group, 38.1% were aged 18–25 years, 57.1% were aged 26–30 years, and 4.8% were aged 31–40 years. In terms of educational level, 14.2% of the participants had bachelor’s degrees or below, 42.9% had master’s degree or were postgraduates, and 42.9% had doctor’s degree or were PhD students. With respect to employment status, 47.6% of the participants were students, and 52.4% were employed.
3.2 Reliability and Validity Test of the Affective Experience Questionnaire
The reliability and validity of affective experience questionnaire were tested with the data collected from the scoring procedure. 315 sample points were collected and SPSS22.0 software was used for reliability analysis and exploratory factor analysis of the data. The results showed that the Kaiser-Meyer-Olkin measure of sampling adequacy and the Bartlett’s test of sphericity all indicated that this scale was appropriate for factor analysis (KMO = 0.743, Chi-square = 866.33, df = 3, p = 0.000). Table 1 presents the validity test results of the scale. Only one principal component was extracted which explained the 88.97% of the variance. And the factor loads of the three variables were all greater than 0.80, indicating that the validity of the scale was good. The reliability analysis result showed that the questionnaire was reliable (Cronbach’s α = 0.94). Therefore, the questionnaire met the requirements of reliability and validity.
3.3 Data Analysis of 7-Point Scoring Procedure
We used one-way analysis of variance (ANOVA) to test the differences in the average scores for participants’ affective experience with the three information sources (i.e. no information source, information sources from professional organizations, information sources from internet users) provided in voice system’s response. Response type was used as factor, and the three variables of the affective experience questionnaire were used as dependent variables.
The data analysis results of all five experimental scenarios are shown in Fig. 1. In terms of the affection, the results showed that there was a significant difference between the average score of the voice system’s responses in which information sources from professional organizations were provided and the average score of the responses in which information sources from internet users were provided. Participants preferred the responses with information sources from professional organizations. Regarding the acceptance and satisfaction, there were significant differences between the average scores of the responses with no information source and the responses with information sources from professional organizations. The differences of the average scores between responses with information sources from professional organizations and responses with information source from internet users were significant, too. The average scores of acceptance and satisfaction were higher when responses with information sources from professional organizations were provided. Therefore, on the whole, participants preferred responses with information sources from professional organizations the most, followed by responses with no information sources and information sources from internet users. There was no significant preference difference between the average scores of the responses with no information source and the responses with information sources from internet users.
Then, response type was used as factor, the three variables of affective experience questionnaire were used as dependent variables. The ANOVA analysis was conducted in the five application scenarios, including music query, news query, health query, travel query, and restaurant query, respectively to test the differences among the average scores of the three response types.
In the music query scenario (see Fig. 2), there were no significant differences among the average scores of the three different responses types. In terms of scoring trends, participants preferred responses with information sources from professional organizations the most, followed by responses with no information sources and information sources from internet users.
In the news query scenario (see Fig. 3), in terms of affection variable, there was a significant difference between the average score of the responses with information sources from professional organizations and the average score of the responses with information sources from internet users. Participants preferred the responses with information sources from professional organizations the most. Regarding the acceptance and satisfaction, the significance of multiple comparisons between scores of responses with no information source and scores of responses with information sources from professional organizations met the requirement of p < 0.01. The significance of multiple comparisons between scores of responses with information sources from professional organizations and scores of responses with information sources from internet users met the requirement of p < 0.001. Participants preferred the responses with information sources from professional organizations the most. According to the scoring trends, participants preferred the responses with no information sources than the responses with information sources from internet users.
In the health query scenario (see Fig. 4), for about affection, acceptance and satisfaction, the multiple comparison differences between the responses with information sources from professional organizations and information sources from internet users were significant, and participants preferred the responses with information sources from professional organizations. In terms of acceptance and satisfaction variables, the differences between the average scores of the responses with no information source and information sources from internet users were significant, and participants preferred the responses with no information sources. It indicated that the participants enjoyed the responses with information sources from internet users the least. There were no significant differences between the average scores of the responses with no information source and information sources from professional organizations.
In the travel query scenario (see Fig. 5), there were no significant differences among the average scores of the three different responses types. In terms of scoring trends, participants preferred responses with information sources from internet users the most, followed by responses with information sources from professional organizations and responses with no information sources.
In the restaurant query scenario (see Fig. 6), for about affection variable, there were no significant differences among the average scores of the three different responses types. In terms of acceptance variable, the difference between the average score of the response with no information source and the response with information sources from internet users was significant, and participants preferred the responses with information sources from internet users. Regarding the satisfaction variable, there was a significant difference between the average score of the responses with information sources from professional organizations and the average score of the responses with no information sources. The difference between the average score of the responses with information sources from internet users and the average score of the responses with no information sources was significant, too. And the participants preferred the responses with no information sources the least.
In conclusion, in music query, news query and health query scenarios, participants preferred the responses with information sources from professional organizations most, followed by the responses with no information source and then the responses with information sources from internet users. In travel query and restaurant query scenarios, participants preferred the responses with information sources from internet users the most, followed by the responses with information sources from professional organizations and then the responses with no information source.
4 Discussion
This study examined the scores of participants’ affective experience with the voice system responses, in which different types of information sources were provided in five different application scenarios. The three variables of the affective experience questionnaire were used to explored the participants’ affective experience with different response types.
In human-computer interaction, the user’s perception of credibility can enhance their experience with the product [23,24,25]. And it’s the same for voice systems. It can be seen from the experimental results that users have different preferences for the responses with different information source types in the five experimental scenarios. In relaxing and entertaining scenes, such as tourism query and restaurant query, users prefer the responses with information sources from internet users. In scenarios with specialized requirements, such as news query and health query, users prefer the responses with information sources from professional organizations. The music query scenario was originally a scene of entertainment, but the user’s needs in this study were designed professionally (“Xiaodu Xiaodu, play the best pop vocal album of the year for me”), so in this case, the participants preferred the responses with information sources from professional organizations. Further, if the voice system cannot provide a satisfactory information source to the response, it is better to provide a response with no information source to users, otherwise the effort will only be backfire.
These achievements can be supported by the results of the qualitative interviews in the early stage. During our qualitative interview, we found that our participants had a need to know the information sources of the voice system responses, but not all the responses need to provide information sources accordingly in different scenarios. For examples, P6 commented as “I hope the answer of professional question can be supported by more professional and accurate evidences. Providing the information sources is a form of support”. But how to provide information sources for voice system responses? P3 told us “If it is a restaurant query scenario, I prefer the responses with information sources from internet users, rather than the responses with information sources from professional organizations. If it’s a travel query scenario, I prefer the recommendation from net friends. If it is a medical query, I am more inclined to the responses with information sources from professional organizations”. P10 addressed “I have some requirements of professional for the responses to medical questions, and the voice system response can be given according to a certain doctor, which is more convincing than internet users’ recommendation”.
5 Conclusions and Prospects
In this paper, we found that, in application scenarios with professional requirements, users prefer the voice system responses with the information sources from professional organizations. Nonetheless, in the relaxing and entertaining application scenarios, users prefer the voice system responses with the information sources from internet users. In different application scenarios, different types of voice system responses, which the different information sources were provided, have different effects on users’ affective experiences. Our work brings to light the role of the information source in the effective experiences with the voice interaction system responses in five different experimental scenarios. This work provides implications for the emotional design of future voice system responses to improve users’ experience of voice interaction system.
This study explored how information sources play its role in users’ affective experience of voice system responses only in five different experimental scenarios respectively. Further research can also verify the results in the real application scenarios and explore the voice system responses with other information source types on users’ experiences in other application scenarios.
References
Ji, Z., Li, B., Zhu, J., Chen, W.: Mechanism of social media users’ fatigue behavior from the dual-perspective of emotional experience and perceived control. Inf. Stud.: Theory Appl. 42(4), 129–135 (2019)
Kostov, V., Fukuda, S.: Emotion in user interface, voice interaction system. In: 2000 IEEE International Conference on Systems, Man and Cybernetics, pp. 798–803 (2000)
Norman, D.A.: Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books, New York (2004)
Desmet, P.M.A., Hekkert, P.: Framework of product experience. Int. J. Design 1(1), 13–23 (2007)
Li, X., Xiao, Z., Cao, B.: Effects of usability problems on user emotions in human–computer interaction. In: Long, S., Dhillon, Balbir S. (eds.) MMESE 2017. LNEE, vol. 456, pp. 543–552. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6232-2_63
Prastawa, H., Ciptomulyono, U., Laksono-Singgih, M., Hartono, M.: The effect of cognitive and affective aspects on usability. Theor. Issues Ergon. Sci. 20(4), 507–531 (2019)
Yang, X., Aurisicchio, M., Baxter, W.: Understanding affective experiences with conversational agents. In: CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2019)
Newberg, A., Waldman, M.R.: Words Can Change Your Brain: 12 Conversation Strategies to Build Trust, Resolve Conflict, and Increase Intima. Hudson Street Press, The Penguin Group, New York (2012)
Scherer, K.R., Fontaine, J.R.J.: The semantic structure of emotion words across languages is consistent with componential appraisal models of emotion. Cogn. Emot. 33(4), 673–682 (2019)
Aladhadh, S., Zhang, X., Sanderson, M.: Location impact on source and linguistic features for information credibility of social media. Online Inf. Rev. 43(1), 89–112 (2019)
Ushigome, R., et al.: Establishing trusted and timely information source using social media services. In: 16th IEEE Annual Consumer Communications and Networking Conference (2019)
Westerman, D., Spence, P.R., Van Der Heide, B.: Social media as information source: recency of updates and credibility of information. J. Comput.-Mediat. Commun. 19(2), 171–183 (2014)
Hussain, S., Ahmed, W., Jafar, R.M.S., Rabnawaz, A., Jianzhou, Y.: eWOM source credibility, perceived risk and food product customer’s information adoption. Comput. Hum. Behav. 66, 96–102 (2017)
Adiga, N., Prasanna, S.R.M.: Acoustic features modelling for statistical parametric speech synthesis: a review. IETE Tech. Rev. 36(2), 130–149 (2019)
Lee, E.J., Nass, C., Brave, S.: Can computer-generated speech have gender? An experimental test of gender stereotype. In: CHI 2000 Extended Abstracts on Human Factors in Computing Systems, pp. 289–290 (2000)
Richard, L.: Street: evaluation of noncontent speech accommodation. Lang. Commun. 2(1), 13–31 (1982)
Kim, W., Ko, T., Rhiu, I., Yun, M.H.: Mining affective experience for a kansei design study on a recliner. Appl. Ergon. 74, 145–153 (2019)
Chen, S., Epps, J.: Automatic classification of eye activity for cognitive load measurement with emotion interference. Comput. Methods Programs Biomed. 110(2), 111–124 (2013)
Baumgartner, J., Sonderegger, A., Sauer, J.: No need to read: developing a pictorial single-item scale for measuring perceived usability. Int. J. Hum. Comput. Stud. 122, 78–89 (2019)
Hart, J., Sutcliffe, A.: Is it all about the apps or the device? User experience and technology acceptance among iPad users. Int. J. Hum. Comput. Stud. 130, 93–112 (2019)
Mao, Y., Fan, Z., Zhao, J., Zhang, Q., He, W.: An emotional contagion based simulation for emergency evacuation peer behavior decision. Simul. Model. Pract. Theory 96, 101936 (2019)
MacWhinney, B., St. James, J., Schunn, C., Li, P.: Schneider W: STEP—a system for teaching experimental psychology using E-Prime. Behav. Res. Methods Instr. Comput. 33, 287 (2001)
Kang, J.-W., Namkung, Y.: The information quality and source credibility matter in customers’ evaluation toward food O2O commerce. Int. J. Hospit. Manage. 78, 189–198 (2019)
Nayak, S., et al.: Integrating user behavior with engineering design of point-of-care diagnostic devices: theoretical framework and empirical findings. Lab Chip 19(13), 2241–2255 (2019)
Shin, D.-H., Lee, S., Hwang, Y.: How do credibility and utility play in the user experience of health informatics services? Comput. Hum. Behav. 67, 292–302 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Zhou, R., Sun, Y., Zou, L., Wang, H., Zhao, M. (2020). Whether Information Source Should Be Provided in the Response of Voice Interaction System?. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. Mental Workload, Human Physiology, and Human Energy. HCII 2020. Lecture Notes in Computer Science(), vol 12186. Springer, Cham. https://doi.org/10.1007/978-3-030-49044-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-49044-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49043-0
Online ISBN: 978-3-030-49044-7
eBook Packages: Computer ScienceComputer Science (R0)