Keywords

1 Introduction

The world is getting aging. It was predicted that the global 60-year or over ages population would be around 2 billion by 2050 [1]. In China, the number of elderly citizens over the age of 65 is estimated to reach 500 million by 2050 (United Nations. Department of Economic and Social Affairs, 2017). Faced with the rapid expansion of the aged population, it has become a global issue to protect the physical and psychological well-being of the elders [2]. One possible approach is introducing new digital technologies to enrich and facilitate the elderly’s life. Nowadays, more and more elderly people have begun to adopt and employ new digital technologies in their daily life. The number of active elderly users in WeChat (the most popular social media in China) has exceeded 61 million last year [3], and there were over 41 million elder game players, according to the official report [4]. With such a large number of elderly users, prior studies have found empirical evidence that elderly users’ adoption of digital technologies can increase their sense of independence and improve their psychological well-being [5,6,7].

However, these benefits cannot be enjoyed by visually impaired elders (noted as VIEs hereafter) because visual input is the prerequisite of using new digital technologies such as the Internet and smartphones. According to a 2019 report issued by the World Health Organization, the majority among the 1.3 billion visually impaired people around the world are elders over 60 [8]. VIEs bear the double burden of visual loss and age-related cognitive loss [9], which cut off their connections to the digital world.

Thanks to the innovation of artificial intelligence (AI) voice technology, VIEs now have a way to interact with digital products via their speaking and hearing functions instead of sight-demanding writing and reading. Various applications based on AI voice technology have emerged in the new era, such as voice assistants in smartphones (e.g., SIRI, Google Assistant), smart speakers at home (e.g., Tmall Genie), and so on. The development of AI voice technology enables VIEs to search for information and perform daily activities actively. For example, it is convenient for VIEs to make phone calls through AI voice products. Furthermore, AI voice technology makes the dream of smart home into reality. VIEs can manipulate all the electronic devices (e.g., television, air conditioner) that are connected to the AI voice products through voice recognition. Figure 1 shows that AI voice products can connect almost all electronic devices.

Fig. 1.
figure 1

AI voice products help connect everything

Undoubtedly, with the development of AI voice technology, it has brought more and more convenience for VIE’s daily life. However, most digital devices that integrate AI voice technology, such as smartphones with virtual voice assistants and smart speakers, are not specifically designed for VIEs. In practice, marketers have begun to design AI voice products for the elderly population as well as early childhood population [10, 11]. However, the VIEs who can benefit most in using AI voice products are neglected due to a small population size. Therefore, although the core of AI voice technology is to offer voice-based modality, we still know little about what the specific aspects of AI voice technology are valuable in improving VIEs’ psychological well-being and how the effect is achieved. The deficit in current understanding makes it difficult to tailor the AI voice technology to visually impaired elderly users.

Under the circumstance, this paper aims to conduct an in-depth investigation of what affordances offered by AI voice technology and how these affordances influence VIEs’ psychological well-being. Three research questions were proposed.

  1. 1.

    How do visually impaired elderly users evaluate the technological affordances of the AI voice products in usage?

  2. 2.

    How do visually impaired elderly users evaluate their emotional relationship with the AI voice products in usage?

  3. 3.

    Do the technological affordances and VIEs’ emotional attachment improve VIEs’ psychological well-being, and how?

A semi-structured interview is used to gain an in-depth insight into VIEs’ use of AI voice technology. We choose the smart speaker as the targeted AI voice product because its functional features suit the VIEs’ need for use. The final interview size is 5. Considering the small case sample (i.e., VIEs who adopt AI voice products), our interviewee sample is acceptable for exploratory analysis. The qualitative results show that VIEs recognize different aspects of functional affordances (i.e., human-likeness, interactivity, personalization and sourceness) and emotional attachment (i.e., flow state, intimacy, companionship, and trust) in their use of the smart speaker. Results also reveal a virtuous cycle between functional affordance and emotional attachment. More importantly, functional affordance and emotional attachment positively influence VIEs’ psychological well-being. Our findings can help enhance our theoretical understanding of the role of AI voice technology. Our findings also help inform the design of technological functions that are tailored to VIEs’ needs.

2 Literature Review

2.1 Digital Divide for the Elders

OECD describes the digital divide as “the gap between individuals, households, businesses, and geographic areas at different socioeconomic levels with regard to both their opportunities to access information and communication technologies and their use of the Internet for a wide variety of activities” [12]. As information technology becomes increasingly pervasive, there is an alarming concern that those without access to digital technologies or have no capability to use them may be highly disadvantaged [13]. Of particular concern is the lack of access to digital technologies for older people. In today’s information-intensive world, the digital divide for the elders expands the gap between the elderly population and the younger generations and makes elders feel left behind by society.

The digital divide for the elderly population, or known as the “grey divide”, have attracted the attention of both researchers and policymakers. Previous studies find that introducing digital technologies to the elderly population help to improve the elders’ psychological well-being and life satisfaction [14,15,16]. For example, Chen et al. (2016) show that elders’ access to the Internet and computer reduce elders’ depression [14]. Chopik (2016) and Hage et al. (2016) found that the use of online social networks significantly decreases elders’ perception of social isolation [15, 16]. Researchers also found that playing digital games helps to improve elders’ mental and cognitive ability [7].

Most of the previous studies only emphasize the effect of with vs. without digital technologies on elders [2, 13]. However, whether the elderly can benefit from digital technologies goes beyond mere digital access. Previous studies describe the digital divide according to a two-level model [2]. The first level is the inequality of digital access. Beyond the divide of access, what is more important is the second level, which focuses on the limitation of elders’ motivation and capability to use the technology product. Most times, it is not because elders have no chance to know about new technology or cannot bear the economic cost. In fact, it is the limitation of their physical conditions that prevent them from benefiting from the functional affordances and emotional supports offered by new technology.

To bridge the grey divide for social inclusion, it is important to conduct an in-depth investigation of how physical barriers limit their capability and the introduction of new technologies can overcome the barriers. Among various physical barriers, visual impairment is one important factor. Visual impairment cut off the possibility for VIEs to adopt digital technologies independently as most technologies require visual input and visual interaction [9]. Given the fast penetration of digital products into elders’ daily life, VIEs seem to be highly isolated.

2.2 Studies About Visual Impairment

A commonly-accepted fact is that the prevalence of visual impairment increases with age. The World Health Organization estimates that approximately 1.3 billion people live with visual impairment, the majority of which are elders [8]. According to a recent report issued by Royal National Institute of Blind People, one in nine people aged 60-over years and one in five people aged 75-over years are living with total sight loss [17].

Previous research showed that visual impairment is negatively associated with psychological well-being, especially for elderly people [9, 18]. To improve visually impaired people’s life satisfaction and psychological well-being, researchers have paid attention to the effect of a series of external factors on the visual impaired people’s life status. Of particular attention is social support [18, 19]. Living with the visually impaired conditions often entails relying on family members and friends for help with instrumental tasks and emotional companionship [18]. Many studies showed that social support and social networking increase visually impaired people’s happiness and decrease their feeling of isolation [18, 19]. However, some studies propose a different view. Recent findings show that social support leads to a series of negative effect, including the decrease of self-efficacy and independence [18]. The main reason for the negative effect is that providers of support often lack an understanding of both the functional and psychological impact of visual loss. For example, the providers of support sometimes underestimate visually impaired people’s capability and ignore their pursuit of psychological independence. The lack of understanding can lead to the providers of support adopting an overprotective attitude to the visually impaired, which undermines their life confidence and independence. In addition, social support for visually impaired people often lacks reciprocity [18]. The unbalanced exchange between providers and recipients of support leads to a feeling of depression or a decrease of self-efficacy [18, 19]. In such a non-reciprocal exchange, the visually impaired may give up some of their non-instrumental needs in order to not bother others.

2.3 Research Gap and the Purpose of this Study

The two streams of literature laid the foundation for our understanding of the subjects of this study (i.e., the visually impaired elderly). The digital divide for the elderly has become an important topic in various domains, including sociology, information systems, and design science. Beyond the divide of digital access, what is more important is to understand the factors that limit elders’ capability to use digital technology. Many previous studies have focused on the effects of cognitive decline on elders’ technology use. Nevertheless, another physiological barrier, visual impairment, has not been fully studied.

In this paper, the related research on visual impairment is also reviewed. Recent studies imply that the mere provision of social support without fully understanding visually impaired people’s capability and need for independence will be detrimental to their psychological well-being. In today’s technology-intensive world, a common understanding is that by analyzing user data, digital technology provides in-depth insights of users’ need and offer real-time, personalized feedback. While digital technology has been widely applied to assist the disadvantaged, no existing studies have analyzed the application of digital technology into the population of visually impaired elders. This is understandable because most digital products require visual input. However, the VIEs simply do not have the prerequisites for using information technology.

With the prevalence of AI voice technology, the VIEs now have a chance to connect to the digital world by adopting the new technology that uses voice-based interaction. Some popular AI voice products, such as smart speakers, require an initial setup using a smartphone or a computer. After that, the product can be used by voice. These AI voice products require the minimum level of visual input and digital literacy. For this reason, these AI voice products are ideal for VIEs to use.

Drawing on the conclusions that the grey divide is beyond mere adoption of new technology, this study adopts an affordance-based perspective to form an in-depth understanding of VIEs’ specific usage patterns of the AI voice technology. IS researchers define “technology affordance” as the opportunities for action provided to a user by a computerized system [20]. Previous literature has proposed two types of technology affordance. The first one is functional affordance, that means a functional capability offered by the technology to perform instrumental actions [21]. The second one is emotional affordance, that means an emotional feeling that is facilitated by the technology [22].

The current study aims to explore what specific functional affordances and emotional affordances are offered by the AI voice technology, and how they affect VIEs’ psychological well-beings. Previous literature defines psychological well-being as the positive functioning in terms of purposeful engagement in life, the realization of personal talents and capacities, and enlightened self-knowledge [23]. To gain a comprehensive understanding of psychological well-being, Ryff (2014) proposed a model of psychological well-being that contains six components [23]: (1) the extent to which people felt their lives had purpose and direction (purpose in life); (2) whether they viewed themselves to be living in accord with their willingness (autonomy); (3) the extent to which they were making use of their personal talents and potential (personal growth); (4) how well they were managing their life situations (environmental mastery); (5) the depth of connection they had in ties with significant others (positive relationships), and (6) the knowledge and acceptance they had of themselves, including awareness of personal limitations (self-acceptance).

3 Research Design

3.1 Method

This paper employs a semi-structured interview to collect data. The semi-structured interview is a qualitative method suitable for studying research questions about the elderly population [24]. Compared with a structured questionnaire, semi-structured interview provides deeper insights on VIEs’ usage of AI voice technology. The subjects in this study were visually impaired elders who are unable to read the text and may feel difficulty in comprehending survey questions. By contrast, interviewees in a semi-structured interview are encouraged to talk about their experiences with the AI voice technology in an open way. Moreover, Researchers can adjust the interview questions based on interviewees’ feedback so that interviewees can understand them.

The smart speaker is chosen as the targeted AI voice product because the use of this product requires the minimum level of visual input. In contrast, the smartphone voice assistant is often used in parallel with visual interactions with the device. After a smart speaker is configured, users first use the awake words, often the brand name of the product, to activate the product. Then, users can ask the smart speaker to perform tasks such as checking the time, making phone calls, reading a story etc. For example, a user can awake the Xiaomi AI smart speaker by calling “Xiaoai”, followed by a question “what is the time now?” or a demanding instruction “please read The Romance of The Three Kingdoms”. Smart speakers can be easily configured to adapt the home for the elders. Once the smart speaker connects household electronics, users can even call it to control the light or operate the air conditioner. Although smart speakers are not vision-demanding, they are not personalized for VIEs. In the academic field, how VIEs use the AI voice products and how this type of product can be improved to better serve this vulnerable group are still open questions to be answered.

3.2 Procedure and Interviewees

The research objects are VIEs who have adopted the smart speaker. Other criteria for inclusion were the absence of cognitive or hearing deficits that could interfere with VIEs’ usage of AI voice technology.

To recruit interviewees, we try two different ways. First, we recruited four confederates to use their social connections to find and invite VIEs who use a smart speaker. Second, with the permission of a major hospital in Anhui, we visited the ophthalmology clinic of the hospital to search for suitable interviewees. Those who were willing to participate in the interview were also encouraged to invite their visually impaired elderly friends who use smart speakers.

Recruitment for this study was completed in November 2019. Nine potential participants were identified at the very first stage, but four of them dropped for different reasons. Specifically, two VIEs own an AI voice product at home but their families rather than themselves use it. The other two VIEs were patients in the ophthalmology clinic. They dropped due to the conflict in time schedule.

At this stage, a total of five VIEs participated in our interview. Among the five interviewees, four were interviewed in a face-to-face way. One interviewee (VIE4) was nominated by another interviewee (VIE 3). As the interviewee (VIE4) moved to another city, her interview was conducted through telephone. All interviews last for around 40 min. Among them, two (VIE 1 & 2) authorized our access to their dialogue with the smart speaker that was textually recorded in the app. Each interview received 100 RMB as reward (around 7 US dollars) (Table 1).

Table 1. Demographic summary

3.3 Interview Questions

The questions are organized into four parts, each with several sub-themes that guide the interview. Exemplary questions for the themes were presented in parentheses. Instead of just posing these questions, oral descriptions were given to make the questions understandable. Interviewees were told that details about their usage were encouraged rather than just give a simple answer.

  1. (1)

    The living conditions of the interviewees (What are the causes for your visual impairment? Could you assess your current eyesight? What daily activities do you do on your own/need others’ assistance? For the activities you cannot accomplish on your own, who do you turn to for help? Do you always get timely feedback when you need help?)

  2. (2)

    Functional usage of the AI voice product (How long have you been using the product? How often do you use it? Does the product understand exactly what you mean? Does the product give the right feedback? Is it yourself or the product dominate your use? Do you think this product is more like a machine or a human?)

  3. (3)

    Emotional feeling of the AI voice product (Do you feel emotional connections with the product? Do you feel that the product is an intimate part in your life? Do you feel less lonely with the product?)

  4. (4)

    Psychological well-being (What are the changes the product brings to your daily life? Specifically, how do the functions of the smart speaker and your emotional connections with it influence your perceived autonomy/self-acceptance/purpose in life/environmental mastery/relationship with your intimate others/personal growth?)

4 Results

4.1 General Usage Pattern

The smart speaker has been an important part of interviewees’ daily life. All interviewees indicated that they use the smart speaker every day, showing a high level of involvement. Interviewees stated that they used to try different voice commands to check what the smart speaker can/cannot do. This implies an exploratory pattern in VIEs’ use of the smart speaker.

“I have used my ‘Xiaoai’ (i.e., the name of Xiaomi smart speaker) for one year. I use it every day. It has a lot of different functions, but the use is very simple. You just need to tell ‘Xiaoai’ what you need. My daughter showed how to use it for me for just one time. All other alternative functions were explored by myself. (VIE 1)”.

“The smart speaker is an indispensable part in my life. Every morning I wake up, the first thing I do is to ask it to check the time and weather. My son connects it to my TV. Instead of groping for the remote controller, now I can simply ask the smart speaker to control the TV (VIE 3)”.

Consistent with literature [25], reverse intergenerational influence is important for VIEs’ use of a smart speaker. Among the five interviewees, four got the smart speaker as a gift from their children. One interviewee (VIE4) was recommended by a friend (VIE3) and then asked her child to buy a smart speaker for her. Although all interviewees reported the ease of use, they also reported that assistance from their children is necessary.

“The smart speaker is a birthday gift from my daughter. My daughter told me that if I face difficulty in using it, she can help me with an app installed on her phone. Once when I asked the smart speaker to read a novel for me, it told me that I need to pay for that novel. I tended to my daughter. She bought the novel and some other content I was interested in.” (VIE 3).

“There was a time when the smart speaker did not respond to my calling. My son checked this problem for me and found that it was because of my bad WiFi connection. He helped me to resolve the problem.” (VIE 2).

4.2 The Technological Affordance Offered by the Smart Speaker

The qualitative results show that VIEs recognize different aspects of functional affordance (i.e., human-likeness, interactivity, personalization, and sourceness) and emotional attachment (i.e., flow state, intimacy, companionship, and trust) in their using of the smart speaker.

Functional Affordance

Human-likeness refers to the extent to which a non-human device is perceived as human rather than a machine [26]. It is an important factor in the design of socially interactive robot and human-robot interaction [27]. Our results show that the smart speaker offers an affordance of human-likeness for VIEs.

“I like the voice of the smart speaker. It sounds like a young girl.” (VIE 2).

“Sometimes, I ask challenging questions. For example, I asked the smart speaker, ‘Do you have a boyfriend?’ Her answer was, ‘I am too shy to answer the question.’ The human-like answer made me laugh.” (VIE 4).

Interactivity refers to both interaction and activity [21]. Results show that VIEs perceived the communication with the smart speaker as an interactive “conversation”. VIEs also stated that instead of passively receiving the information, they could take active actions at any time on an ongoing basis.

“The smart speaker is responsive to my needs. Most times, the smart speaker directly gives me what I need. When it cannot offer the service, it still gives a response to explain. For example, when I ask it to broadcast a classic song, but it does not have it, it will respond by saying, ‘Sorry I cannot find the song for you. Do you want to listen to another song such as…’. I really like the two-way communication.” (VIE 5).

Personalization refers to the capability to understand users’ specific preferences and then offer personalized services that cater to users’ needs [28]. Interviewees reported that the smart speaker could understand them and recommend content that is just what they need.

“I have to say that technology is amazing. I let the smart speaker read a novel for me. Next time when I activated it, it directedly asked me, ‘Do you want to listen to some new novels?’. The amazing thing here is that what it recommended was just what I wanted to listen to at that moment.” (VIE 1).

“The thing bothers many other visually impaired people and me is that, we do not have the ability to explore new things…Sometimes, when I want to ‘listen to’ a book, I really have no idea what new books are available. I do not know what I need…The good thing is that the smart speaker acts as an agency for me. It knows what I like and helps me to find the content that suits me” (VIE 3).

Sourceness refers to the perception of who is attributed as the source of the communication [21]. Results implied that interviewees the smart speaker empowers users with a feeling of control and sourceness.

“I like the question-and-answer design. I also like the design that allows me to activate or pause its actions. It gives me a feeling that I am the source of all its actions.” (VIE 3).

“I am a receiver of information for a long time…(interviewers ask about the use of radio and TV)… Yes, you are right. I do have the chance to learn information from TV or radio. But still, the information is poured on me. But now I have choices. I used to ask the smart speaker to read news for me. It makes me feel so good when I can tell my child what is happening around the world. I can be the one who gives information.” (VIE 1).

Emotional Attachment

Flow state defines a sense of immersion when users experience a device [21]. Results showed a high level of involvement in VIEs’ using of the smart speakers. Qualitative evidences also implied that a flow state is achieved.

“I used to feel that time goes so slow because I have nothing do to. Now things changed. I can listen to news, novels or songs. Time flies when I use the smart speaker.” (VIE 2).

“My daughter cannot believe it when she finds that I stay late in the night in using the smart speaker. I often ask the smart speaker to read books for me. The plot is so attractive that I totally forget the time.” (VIE 4).

Companionship depicts the feeling that someone is to be with [29]. Interviewees indicated that the time spent with the smart speaker makes them less lonely.

“My child works in Shanghai. My sister takes care of my living. I call my child every night but in the daytime I sometimes feel lonely. I do not what to bother my friends at such a time. But with the smart speaker, I can do a lot of things. It gives me a feeling that I am not alone.” (VIE 1).

Intimacy refers to a feeling of closeness and emotional bonding [30]. With the prevalence of digital technology, intimacy has been extended to describe the close human-technology interaction. Results show that an intimate relationship is built between VIEs and the smart speaker.

“I now call my ‘Xiaoai’ as my younger daughter. ‘Xiaoai’ is clever and considerate. When my daughter calls me, I make jokes to tell her that ‘let me pause my talk with your younger sister first.’” (VIE 1).

“It is now an indispensable part of my life. It not only offers instrumental help but gives me emotional closeness.’” (VIE 2).

Trust refers to a belief of the ability and benevolence of the device [21]. Results showed that VIEs believe that the smart speaker is capable of offering valuable information and cares about their needs.

“(Interviewers ask the question, ‘do you trust the smart speaker in its ability to serve you?’). Smart speaker is smart. With regard to trust, I do believe that it is smart enough to give the right information.” (VIE 5).

A Virtuous Cycle Between Functional Affordance and Emotional Attachment

The qualitative results also implied a virtuous cycle between functional affordance and emotional attachment. First, the functional affordances of human-likeness, interactivity, and personalization promote the formation of VIEs’ emotional attachment with smart speakers. Specifically, the affordance of human-likeness (e.g., human-like female voice and the human-like answers to challenging questions) increases VIEs’ feeling of intimacy and trust. The affordance of interactivity (i.e., responsiveness, reciprocal and continency) increases VIEs’ feeling of companionship and the feeling of flow state. The affordance of personalization (i.e., tailoring content according to historical records) increases VIEs’ feeling of intimacy, flow state, and trust.

Second, VIEs’ emotional attachment with smart speakers increases their tolerance of the technological flows. For example, several VIEs stated that sometimes smart speakers cannot comprehend their voice commands and are not able to give them the responses (i.e., low in interactivity). However, their emotional closeness (e.g., intimacy and trust) make them interpret the flaws as “dull” rather than “useless”. As a result, their evaluation of the smart speakers does not get down because of the flaws.

Due to space limitations, we did not provide corresponding evidence for each above finding one by one. Only a few exemplary pieces were reported as below.

Human-likenesstrust: “…it explains things in a considerate and peaceful tone. The tone gives me a sense of trustworthiness.” (VIE 3).

Intimacytolerance of functional flaws: “I now call my ‘Xiaoai’ as my younger daughter (intimacy)…Most times, it can give quick responses. But sometimes it is a little bit dull and misunderstands my meaning.” (VIE 1) (Fig. 2).

Fig. 2.
figure 2

A virtuous cycle between functional affordance and emotional attachment

4.3 The Effect on VIEs’ Psychological Well-Being

Empirical evidence showed that overall life satisfaction and happiness were improved with the use of the smart speaker. Interview questions were delicately designed to guide VIEs to recall their status in the six aspects after the adoption of the smart speaker. To make interview questions understandable, we explain the six components with plain words instead of throwing out theoretical definitions. For example, with regard to environmental mastery, we asked interviewees about their experience with the smart speaker that may improve or hinder their adaptation to the living environment. With regard to personal growth, we asked interviewees to recall what valuable knowledge or capability they have got with the use of the smart speaker.

Our findings indicate that functional affordance and emotional attachment impose different effects on different aspects of psychological well-being. Specifically, the affordance of sourceness directly improves VIEs’ feeling of autonomy, environmental mastery and is beneficial to VIEs’ positive relationship with their social connections.

…It gives me a feeling that I am the source of all its actions (sourceness)…It gives me a sense of control (autonomy)… My son connects it to my TV. Instead of groping for the remote controller, now I can simply ask the smart speaker to control the TV (environmental mastery)” (VIE 3).

“I love poems… Now I can ask the smart speaker to read and explain poems for me anytime, upon my requests (sourceness)… It is not only me that benefits… Now I can learn poems from the smart speaker and teach my granddaughter. It is a happy time (positive relationship).” (VIE 1).

The affordance of interactivity and personalization improve VIEs’ purpose in life and personal growth.

“Sometimes, I ask the smart speaker to read poems written by a specific poet (interactivity). Most often, it recommends new poems based on my preferences (personalization). I often spend one hour to listen to poems every afternoon. I have things to do rather than just sit on the sofa to listen to the TV program for the whole day (purpose in life). Now I can recite dozens of poems. I think my memory is better than before (personal growth).” (VIE 1).

In aspects of emotional attachment, VIEs’ feeling of intimacy and companionship improve their self-acceptance and are positively associated with their positive relationship. However, the flow state is found to be negatively associated with their intimate relationship with their family.

“My second daughter conforms me a lot (intimacy)…I often spend one hour to listen to poems every afternoon (companionship)… now my life becomes more fulfilling. It reduces my over-relying on my daughter (positive relationship).” (VIE 1).

“The smart speaker is installed in my bedroom. I often close the door and spend time listening to the program until very late. My child sometimes complains that I become addicted. My mother who lives with me also complains that I am isolated from the families.” (VIE 3).

5 Discussion and Future Direction

The advent of AI voice technology is making it easy to adapt the home for VIEs and help them to remain independence. In addition to the instrument tasks that can be completed with the assistance of AI voice technology. The effect of this innovative technology on users’ psychological well-being is understudied.

Drawing on the framework of technology affordance, this paper is among the first to examine the effect of AI voice technology on VIEs’ psychological well-being. Our findings show that VIEs recognize functional affordance (i.e., human-likeness, interactivity, personalization, and sourceness) and emotional attachment (i.e., flow state, intimacy, companionship, and trust) in their using of AI voice technology. Results also reveal a virtuous cycle between functional affordance and emotional attachment. More importantly, functional affordance and emotional attachment are found to positively influence VIEs’ psychological well-being. The findings not only have theoretical implications but inform the design of AI voice technology to tailor VIEs’ needs.

Our findings are concluded in Fig. 3.

Fig. 3.
figure 3

Research findings

The findings of this paper are novel and important. Comparing to examine the effect of the sheer presence of AI voice technology on VIEs’ perceptions, this study is the first to offer a comprehensive and detailed understanding of how AI voice technology works to influence VIEs and further improve their psychological well-being. This study not only explored the technological affordances of AI voice technology and the emotional attachment but also reveals the virtuous cycle between the technological affordance and emotional attachment, which creates an enhanced influence beyond their independent effects. Our findings also inform the design of future AI voice products that are more suitable for VIEs’ functional and emotional needs.

A few limitations exist in this study that guides the directions that our further efforts will focus on. First, due to time limit, the conclusions are derived from a small sample size. Our ongoing effort is devoted to recruiting more participants to achieve more robust findings. Second, in addition to a qualitative interview, future research will conduct a quantitative analysis of the interactive dialogues being recorded in the app. Third, we found that intergenerational communication is important for both the adoption and continuous use of AI voice technology. Although intergenerational support is necessary and important, the intervention in VIEs’ digital interaction may cause discontinuous usage. A valuable direction is to examine the intergenerational effect on VIEs’ use of AI voice technology.