Keywords

1 Introduction

In-vehicle information systems (IVIS) are available in virtually every car segment from luxury to low price vehicles. They are used for a wide variety of in-vehicle tasks such as adjusting vehicle settings, using navigation systems or in-vehicle entertainment. Providing customers with a positive and pleasurable experience while interaction with IVIS should thus be a primary concern of HCI research. However, increased complexity of IVIS might also cause negative affect such as frustration or anger, especially when drivers are not familiar with the respective IVIS. Recognizing these negative emotions could be used both to guide the design process (e.g., by re-design of interaction sequences that commonly cause negative affect) or to provide on-line tutoring systems that provide assistance in difficult situations.

2 Theoretical Background

In the driving context, driver’s emotions are often studied as reaction to certain environmental conditions, such as frustrating situations like traffic jams or as a reaction to the behavior of other road users [1, 2]. Within this context, the intention of investigating driver’s negative emotional state lies in the reduction of aggressive driving and therefore crash rates [3].

However, there are only very few studies to date that have tried to induce and measure negative affect by interaction with IVIS. In most cases, a faulty ASR system (Automatic speech recognition system) was investigated. For example, Bachfischer et al. [4] found that users become annoyed and stressed when interacting with a faulty ASR. Malta et al. [5] also tried to measure the frustration level of drivers during interaction with a faulty ASR. The drivers rated their frustration level by watching a video of their face which was recorded during the interaction with the system. For the rating a slider was used, that could be set from neutral to extreme. For predicting the frustration level measured by the participant’s responses through his/her physiological state, overall facial expression, and pedal actuation, Bayesian networks were used. Kuhn et al. [6] assessed frustration induced by different recognition rates of an ASR (44% recognition rate vs. 89%) on driving performance. They found a higher steering wheel angle variance with the low recognition rate, which points out that negative affect can indeed influence driving behavior.

According to [7] for the driving context the best measure is to assess driver’s emotions by analysing facial expression. Other measures (e.g. questionnaires or EEG) would disturb the driver and would have only a low specificity to differentiate between different emotions.

Eyben et al. [8] summarize different literature with regard to emotions in driving. Among other sources of emotions, they also discuss the effects of IVIS interactions. As a meaningful strategy to reduce these negative emotions they propose a “socially competent human-like car” including an IVIS interaction which behaves like a real human codriver, that would give personalized help and quickly offer alternative solutions. However, the realization of such an approach would require the measurement of drivers’ negative affect in the first place, which is exactly what we did in this study.

3 Method

N = 29 participants completed IVIS-tasks of varying difficulty in a high-fidelity driving simulator. A real-world IVIS was transferred to the simulation environment. The aim of the study was to investigate whether drivers would report negative affect when solving difficult IVIS tasks and whether this would be reflected in the drivers’ facial expressions. Therefore, self-report scales as well as a standardized observation protocol were developed. The observation protocol was used to rate the facial expressions during interactions with the IVIS and was adapted from the Facial Action Coding System (FACS, Ekman et al. [9]). To account for individual differences in the expression of negative affect, the participants completed the State-Trait Anger Expression Inventory (STAXI, Schwenkmezger et al. [10]). During the study, drivers had to complete the IVIS task while following a simulated car. A feedback of whether the task was solved correctly or not was given after each IVIS task.

3.1 Setting

The study was conducted at the Wuerzburg Institute for Traffic Sciences (WIVW). A static driving simulator with an Opel Insignia mockup was used (Fig. 1 left). The front of the car was surrounded by a screen illuminated by three projectors providing a 270° field of view. In addition, displays were placed in the back of the car and on the side mirrors to allow the monitoring of surrounding traffic. The driving simulation software SILAB was used.

During the test the experimenter was sitting in an adjacent room separated from the simulator by a window. From that room he was able to monitor the test and see the participant through various cameras being placed in the car. He was also able to communicate with the participant using an intercom system.

A Basler-camera with 50 Hz and a solution of 2.3 MP placed in front of the driver recorded the driver’s face and his/her interaction with the IVIS in a multi-screen video.

Fig. 1.
figure 1

Static driving simulator at the Wuerzburg Institute for traffic sciences and the car follow task (described in Sect. 3.2)

3.2 Driving Task

The driving task consisted of a variation of the car-follow-task proposed by the NHTSA distraction guidelines [11]. The participant was driving on a mostly straight road with two lanes. The task was to follow a car in front of the participant’s car and keep a defined distance as constantly as possible. The car ahead changed its speed in a sinus-wave profile with slightly varying amplitude so that the driver had to permanently adjust the speed (Fig. 1 right).

To put the driver in the situation of having to maintain a constant headway, a colored enhanced reality strip (ERS) was displayed on the road. Depending on the distance the color of the ERS changed. It appeared yellow when the distance was too large and blue when the distance was too small. Under perfect circumstances the bar was shown in gray (within a range between 1.0 and 1.8 s time distance). Besides the changing color of the bar, driving in the wrong distance had no additional effect.

3.3 Interface

For the interaction with the IVIS a simplified version of the current AUDI MMI GUI was simulated. It consisted of a hierarchical menu with different lists showing the different sub-menus available in the system. The driver had to navigate to different end points of the menu. The final selection of the end points did not trigger any actual reaction but resulted in a feedback depending on the task whether the selection was correct.

An AUDI MMI interaction unit was placed on top of the mockup’s original console. This unit consisted of a rotary interaction knob and shortcuts for different menu levels (Fig. 2).

Fig. 2.
figure 2

AUDI MMI GUI (left) and interaction unit (right)

3.4 IVIS Tasks

During the 30 min drive the participant had to perform a total of 24 IVIS tasks. 20 of them were visual-manual interactions using the rotary interaction knob to navigate to certain entries in the menu system, the remaining 4 required interactions with a speech-based navigation system. When completing the menu tasks, drivers had to search for certain menu entries and to confirm their selection. The tasks were instructed verbally e.g. “please adjust intensity of the heater” at certain defined positions along the route. A computer-generated voice gave the instructions at the defined road section. The tasks varied in their difficulty to solve. Half of them were very easy to solve while the other tasks were manipulated so that it was very difficult to reach the correct menu entry. This was either reached by a very ambiguous instruction so that it was not clear in which category the menu entry could be found and/or by a very short time buffer available to reach the menu entry before the task was aborted. Some of the tasks were even unsolvable as the instructed menu items were not part of the menu list. The task difficulty was determined in a previous pilot study. During the drive the frequency of tasks with higher complexity increased gradually to induce negative emotions.

The drivers were instructed that by solving the tasks correctly they could gain bonus points. In case of a correct answer, drivers received five bonus points and the feedback “answer was correct” displayed at the lower part of the visual interface. An extra monetary award was promised for that driver who reached a certain target score of 50 points. In case of an incorrect answer the participant got the feedback “task incorrect” in combination with a loud and unpleasant sound. If the driver was not able to reach the entry within a certain time buffer the task was aborted and the same sound was presented.

In addition, four other tasks requiring verbal interactions with the speech system were arranged in between the other menu tasks. They consisted of computer-generated speech outputs of a fictive “navigation system” which informed the driver about an upcoming traffic jam on the road. The driver was asked to either change the original route or to maintain the current route. The driver was instructed to avoid any traffic jam and should therefore answer with “yes” or “no” to the various questions in order to maintain the original route. Two of these tasks were manipulated in the way that the speech recognition system pretended to misinterpret the answer of the driver resulting in an incorrect feedback although it was correct. In one of the speech recognition tasks the answer had to be repeated several times before the system understood it correctly.

In case of wrong answers to the navigation systems 20 points were deducted from driver’s credits. Shortly before the target score of 50 was reached the manipulated navigation task was triggered which resulted in an incorrect feedback and a deduction of the score which should further increase driver’s negative emotions.

Table 1 summarizes the task sequence and the online ratings during the drive.

Table 1. Used tasks and time points for online self-reportings of emotional state. Menu tasks are counted with figures, speech interactions with the navigation system with letters.

29 participants took part in the study. To achieve optimal availability of camera data only persons with a minimal height of 1.64 m were invited. 19 men and 10 women participated with an age of 24 to 65 years (m = 42, sd = 12.8 years).

3.5 Measures

IVIS Task Performance

Driver performance in the IVIS task was coded as dichotomous variable for all of the 24 tasks. The task could either be performed successfully or not. The tasks were summarized into several blocks with each block comprising the tasks between two time points where drivers rated their current emotional state online. As parameter for task performance the relative frequency of successfully solved tasks per block was used. Dependent on the design of the tasks and the study, this frequency per se went down during the course of the drive (as the number of not solvable tasks increased).

Self-reported Affects

During the experiment, drivers were repeatedly asked to online rate their current emotions on different dimensions (how joyful, relaxed, angry, frustrated, helpless, irritated are you at the moment?). A 16-point-rating scale with verbal categories (according to Heller [12]) was used. The ratings had to be given after certain blocks of tasks (including an unequal number of tasks, but summarized according to their intention to induce negative affects).

A pre-rating before the drive served as baseline for the later ratings. Altogether, the online ratings were collected at six time points.

Observation of Driver’s Expression

Drivers’ facial expressions were coded by a rater who was blind for the content and the aim of the study. The basis for the coding was the video footage of the participants which showed the driver’s face, the road and interaction with the menu system (see Fig. 3). The sound was muted so that verbal expressions could not be heard. From the video footage, the sections containing IVIS tasks were selected, resulting in 24 sequences per driver that had to be rated. The sequences started with the verbal instructions of the tasks and ended 10 s after the final feedback of the IVIS system had been given.

Fig. 3.
figure 3

Video footage used for observer rating

The raters were first trained by a specifically developed training procedure which condensed the principles and instructions of the widely-used FACS (Facial Acting Coding System by Ekman et al. [9] which is a tool for measuring facial expressions. It is an anatomical system for describing all observable facial movements. Each of these movements is labeled as an Action Unit (AU). The combination of AUs then defines a certain emotion (typically the six basic emotions). For the intentions of this study, it would have been too time intensive to use the original material and training. Therefore only the relevant AU combinations for the emotion anger were extracted, extended by other indicators for frustration or irritation (not included in FACS) and explained to the raters in a training session. Afterwards, the raters scored each single facial expression within each video sequence on a specially developed observation protocol. The rating was given on a scale from 0 (no emotion present) to 5 (maximum degree of any kind of negative emotion; positive emotions such as amusement were not coded). As independent measure the expression with the highest value per video sequence was used.

In order to achieve a high reliability and validity of the measure, an interrater-reliability score was calculated in advance from the ratings of three independent persons assessing the video sequences of twelve drivers. It turned out, that it was very difficult to differentiate between specific negative emotions (e.g. anger and irritation) leading to the decision to summarize these emotions to a more global rating in the analysis. The calculation of weighted kappa [13] for each pair of the three raters revealed a medium to high interrater-reliability (.673/.487/.556). The final rating was given by one rater who rated all the remaining video sequences of the 29 drivers.

State-Trait-Anger-Inventory

In order to evaluate whether the degree of self-reported negative emotions and the facial expressions correlate with a certain anger-personality the so called State-Trait-Anger Inventory was used (STAXI, Schwenkmezger et al. [10]). It is a measure for state- and trait-anger with three different forms of anger expression. People differ in the extent to which they express anger overtly and directly (anger out), how often anger feelings are hold in or suppressed (anger in) or in the extent to which they are able to control their angry feelings or their overtly expressed anger (anger control).

3.6 Study Procedure

First, the participants had to sign a secrecy agreement and were then presented with general information about the study. All participants were familiar and trained in using a driving simulator. The participants were told that the goal of the study was to evaluate the structure and content of the menu system and how difficult the search for certain menu entries would be. Therefore the tasks would be somewhat difficult but always possible to solve. They were promised a monetary reward if a target score was reached. They were left blind about the real intent of the study.

To become familiar with the system the participants were given six tasks. Each task led to a different section of the menu, so that every section had been seen before the actual test started.

Before the start of the drive, the participants rated their current emotional state on the 6 items using the 16-point-scale. During the 30 min test drive each participant had to perform the 24 IVIS tasks at defined positions on the route. After each of the defined five task blocks the participants again rated their emotional state online. After the drive the participants filled in several questionnaires, including the STAXI. At the end the real intent of the study was resolved. Before participants received the monetary reward (including the promised bonus in case of reaching the target score) a “cool down phase” assured that they could recover from a potentially high arousal level induced by the study. The whole procedure took about 60 min.

4 Results

4.1 Task Performance

According to expectations, task performance decreased during the test drive (see Table 2). The comparison between the five measuring points revealed a highly significant effect F(4, 24) = 34.30, p < .001.

Table 2. Means and standard deviation of the dependent variables.

4.2 Self-reported Emotional State

The self-reported emotional state changed gradually towards a more negative direction (see Fig. 4). The emotions anger and frustration heavily increased beginning with the second task block (which included several complex tasks) and then maintained on a constant, moderate level (not higher than 7 on the scale up to 15). However, there was also a high variation between the subjects. The self-reported helplessness and irritation also increased beginning with the second block of tasks, however, decreased again towards the end of the drive indicating a kind of resignation if it turned out that the system’s reactions are somewhat independent from the own task performance.

Fig. 4.
figure 4

Self-reported emotional state on six dimensions measured online at six time-points (means and standard deviations are displayed).

When looking further into detail for the anger ratings, it gets obvious that about one third of the sample (n = 10 drivers) subjectively perceived a strong or very strong feeling of anger (see Fig. 5). The peak lies here at the very end of the drive after block 5. However, there is also a remarkably large group of participants who expressed to experience very little or no anger at all. According to drivers’ comments some of them realized the real intention of the study and therefore did not seem to be influenced by the tasks’ manipulations. This result shows the difficult underpinning of inducing a certain emotional state in such an artificial experimental setup.

Fig. 5.
figure 5

Distribution of self-reported anger-level on the 16-point rating scale.

4.3 Correlations Between the Different Measures

For the description of the relationships between the dependent measures, Pearson coefficients of correlation were calculated (see Table 3), revealing the following relationships:

Table 3. Correlations between the different measures.
  • Self-reported anger correlated significantly negative with task performance. The lower task performance between two measuring points the higher is the subjectively perceived anger.

  • Facial expression correlated significantly positive with self-reported anger. Participants who reported high anger-values during the drive also had a more intensive facial expression.

  • In addition, participants showed stronger emotions in their faces when they could not perform a task successfully.

The four subscales of STAXI were correlated with each other and with the mean values of self-reported anger as well as the mean values of negative facial expressions. As expected, participants with higher scores on the Trait-Anger scale had also higher scores on the Anger-Out and Anger-In subscale but lower scores on the Anger-Control subscale. In addition, there was a significant positive correlation with the self-reported anger but no significant correlation with the facial expression. Trait-Anger therefore reflects whether and to what extend a person gets angry but gives no information on the appearance or extend of facial expressions.

Furthermore, participants with higher Anger-In scores expressed higher subjective anger but did not show that mandatorily in their faces. Against expectations, Anger-Out did neither correlate with self-reporting of anger nor with the facial expression although this subscale intended to measure the degree to what a person expresses his/her feelings.

5 Discussion and Implications

It could be shown that experiencing difficulties in solving IVIS tasks goes along with self-reported negative affect as well as according facial expressions. The reliability and validity of the newly developed rating tool could also be shown. The study has several implications that are important both for the design of IVIS and advancement of theories of user experience.

Firstly, designers should be aware that cumbersome interaction design will cause negative emotions which are likely to result in non-acceptance of new products. It is well known that negative affect such as anger and frustration can also cause a more risky and ruthless driving style, so that such negative impacts on the driver’s emotional state could ultimately become a safety risk. Secondly, the study is a first step into measuring and eventually preventing such disadvantageous developments in the design process. The study also shows the potential of facial recognition technology to provide assistance or tutoring functions that could relieve the driver from emotional discomfort.

From a theoretical point of view, it is somewhat surprising that anger expressions were not correlated with the STAXI subscales, as the STAXI is the most used questionnaire that measures personality traits related to anger expression. From this finding, it appears that significantly more research has to be conducted to fully understand the development and expression of negative affect (such as anger) related to technology use.