Keywords

1 Introduction

Advances in automation have shifted many tasks previously controlled by unmanned aerial vehicle (UAV) operators to automated systems. These technological advances, coupled with a proliferation of unmanned systems, have pushed the military to explore a supervisory control paradigm for future UAV operations. However, one of the problems experienced by current UAV operators, dramatic fluctuations in workload based on mission context, is also expected to impact future UAV supervisory control operators. For example, the waypoint-based navigation of current UAVs only requires the operator to set the route; once the platform is enroute, it controls the flight systems and the operator’s role is reduced to monitoring the automated system. Assessing human performance in highly automated systems is challenging because the operator’s direct interaction with the system can be limited. How can one assess whether an operator is engaged and monitoring a system when the operator doesn’t need to interact with the system for extended periods of time?

Eye tracking may be an effective means of continuously assessing both operator attention and engagement when interacting with complex automated systems. Eye tracking data is typically broken down to periods where the eye is relatively still (fixating) on a given region and periods where it is moving [1]. Individuals can only process new visual information during the periods where the eye is relatively still or fixating. This makes gaze position, particularly fixations, an important measure of an individual’s overt attention [2]. However, analysis of fixations usually emphasizes areas of interest, comparing where and how long fixations occur in different regions. The results of these types of analyses are very task and even situation specific since the visual field needs to be broken up into task dependent regions.

One newer approach to analyzing gaze data, which does not require the visual field to be divided into areas of interest, is the Nearest Neighbor Index (NNI) [3, 4]. The NNI is the ratio of the average distance of each object to its nearest neighbor, compared to distance one would expect if the distribution were random. This technique was originally developed as a means of characterizing spatially distributed populations [5] and only recently applied to eye tracking data [3]. When NNI has been applied to eye tracking data, it has been shown to be sensitive to changes in task load. Specifically, NNI increases as workload increases. These changes in NNI were characterized as the distribution of gazes becoming more random. The results have been found in both a video game task [3] as well as during a simulated flight [4]. The researchers have suggested that under periods of high workload that an individual might have a more dispersed pattern of gazes so that they are more ready to process incoming information. NNI has only been applied to a limited number of tasks and has not been used in environments where participant’s primary role is to monitor automated systems.

Eye trackers not only measure where an individual directs their visual attention, but also measures the size of an individual’s pupils. The link between increases in pupil diameter and increases in working memory demand has been well documented since the 1960’s [6, 7]. Although much of the early research linking pupil size to mental workload focused on basic tasks such as a digit span task [8], more recent research has shown pupil diameter increases within more complex visual environments such as driving, [9] and unmanned vehicle control [10]. Although low cost eye trackers have been shown to be capable of measuring pupil diameter changes in response to workload in basic tasks with consistent luminance levels [11], they have not been investigated within more complex visual tasks.

The present research seeks to determine if a low cost system can assess changes in workload within a supervisory control task. The goal for this study is to demonstrate that low cost eye tracking can be used to measure changes in workload via increased pupil diameter as well as a more random gaze pattern as accessed by the NNI.

2 Method

2.1 Participants

Nineteen (18 men and 1 woman) Navy and Marine Corps student pilots and flight officers aged 22–29 were recruited from Naval Air Station Pensacola, Florida. Participants were run in two groups. Eye tracking data for one of the participants was not recorded.

2.2 Equipment and Setup

Each of the 10 workstations in the lab were equipped with the Gazepoint GP3 eye tracker, which is capable of collecting both left and right gaze position as well as left and right pupil diameter size at 60 Hz. The task was displayed on a 25 in. Acer monitor at 2560 × 1440 resolution. Participants were all seated approximately 65 cm from the display.

2.3 Supervisory Control Task

Participants in the study used a single-screen version of the Supervisory Control Operations User Testbed (SCOUT) [12]. SCOUT (see Fig. 1) was developed to be a game-like environment where participants are awarded points for simultaneously completing a number of tasks associated with supervising three highly autonomous UAVs. There are three primary tasks within SCOUT including route management, sensor monitoring, and responding to communications. Within the route management task, the participant develops a plan by assigning their vehicles to pursue objectives with varying priority levels, search area sizes, and deadlines. The initial plan may be modified as new objectives become available or changes to the parameters of an existing objective occur during the mission.

Fig. 1.
figure 1

Screen shot from the Supervisory Control Operations User Testbed (SCOUT). SCOUT consists of six main screen areas including the (1) moving map, (2) target information table, (3) sensor task, (4) route builder, (5) vehicle status, and (6) communications.

SCOUT’s sensor monitoring task begins when the vehicle arrives within an objective’s specified search area. The participant has to monitor the vehicle’s sensor for the target shape specified for the objective and then clicking on it when it appears in the feed. The final task is responding to communication queries and command. For a communication query, the participant is asked to provide information on a specific vehicle or objective. For a communication command, the participant is asked to update a parameter of a vehicle (e.g. altitude) or objective (e.g., longitude). The version of SCOUT used in this study was modified so that all of the information could be displayed on a single display.

2.4 Procedure

Participants first completed an informed consent and then performed Gazepoint’s calibration process (looking at a circle as it moved to 9 different positions on the display). Participants next completed a self-paced SCOUT training session that lasted approximately 35 min, followed by a 10-min practice mission. Once familiarized with the SCOUT environment, they completed two 30-min experimental mission scenarios: Prototype and Legacy. In the Prototype scenario, participants could choose to use automation during the payload task. They were told the automation would always find the target but was subject to false alarms; false alarms resulted in point loss. In the Legacy scenario, participants did not have sensor automation available.

Both missions were structured so that after the initial plan was selected each vehicle would proceed to its initial destination for approximately 15 min, where the participants would experience low workload. During this downtime, participants only had to monitor progress and answer one incoming chat message. In the Prototype mission, participants had the option of earning more points by requesting additional chat messages. The second half of each mission was characterized by a heavy task load, as participants had to monitor the payload task, respond to a stream of frequent information requests, and update target information and vehicle commands. The average and maximum subjective workload was assessed using the crew status survey [13] at the beginning of each scenario, after planning, at the end of the low task load period, and at the end of the high task load period.

3 Results

3.1 Eye Tracking Analysis

Fixations.

Fixations were computed using a radius dispersion algorithm in which a packet was either considered to be part of a fixation or not. The dispersion based algorithm identified a fixation when a series of consecutive packets that met the minimum time duration (100 ms) were all within 50 pixels of their computed centroid. The 50 pixels equates to approximately 1° of visual angle. The fixation criteria were selected because they are comparable with those used for complex visual tasks [1]. For the fixation to be extended beyond the minimum duration a new centroid was computed and then compared to each packet to ensure they were all less than the maximum distance from the centroid. This definition of fixation was used to compare average fixation duration as well as to assess NNI. Fixations were compared during the planning phase during six five minute blocks of time (three low task load and three high task load) throughout the mission (excluding times when the scenario was paused for a workload probe or SA probe). For the data to be included in the analysis at least 50 fixations, the suggested minimum for NNI, were needed for each time segment. Sixteen percent of the total blocks did not meet this minimum.

Fixation Duration.

The fixation duration analysis looked at the mean during the planning phase (variable length) and the six five minute time segments. A 2 way repeated measures ANOVA (Automation scenario × Time segment) was performed on average fixation duration. There was a significant effect of time segment F(6,98) = 3.312, P < .005 on the mean fixation duration. There was a trend for fixation duration to decrease over each 5 min block, however Post Hoc analysis revealed that only the difference between the first five minutes after planning and the last five minutes of the experiment were significantly different. Figure 2 shows the average fixation duration for each of the time segments. There was no main effect of Automation scenario or interaction of automation and time.

Fig. 2.
figure 2

Average fixation duration for both the no automation and adaptable automation scenarios for the planning phase and six 5-min time blocks. Error bars represent standard error of the mean

Nearest Neighbor Index. The NNI was computed using fixations from each of the six time blocks and planning period for each scenario using the convex hull method of computing area. A 2 way repeated measures ANOVA (Automation scenario × Time segment) did not find any significant main effects or interactions for NNI. The NNI’s for each time segment are shown in Table 1.

Table 1. Nearest neighbor index for each time segment

Pupil Diameter.

Gazepoint provides the left and right pupil diameter measured by the number of pixels each pupil occupies on the camera, as well as a quality measure that indicates whether each particular sample is good or bad. When the quality for both eyes is indicated as valid, the left and right pupil sizes are highly correlated. The analysis looked at right pupil diameter during the planning phase (variable length) and during six five minute blocks of time (three low task load and three high task load). Data for each five minute block of time were considered for each participant only if there was at least 30 s of good data for that block. Twenty percent of the total blocks did not meet the minimum 30 s threshold and were not included in the analysis.

A 2 way repeated measures ANOVA (Automation Scenario × Time segment) was performed on right pupil diameter data. There was a significant effect of time segment F(6,96) = 11.119, P < .01 on right pupil diameter. Post hoc analysis revealed a significant increase in pupil diameter for the last three 5 min blocks compared to the planning phase and the first three 5-min blocks. The pupil diameter data is presented in Fig. 3. There was no main effect of automation scenario or interaction of automation and time.

Fig. 3.
figure 3

Right pupil diameter over time for each scenario. Error bars represent standard error of the mean.

3.2 Subjective Workload and Fatigue

The current research utilized a computerized version of the crew status survey, a psychometrically validated unidimensional workload and fatigue scale developed by the Air Force, which measures both the average and maximum workload experienced on a 7 point anchored scale [13]. A 2 way repeated measures ANOVA (Automation Scenario × Task load) performed on both the average and maximum workload yielded the same pattern of results. There was a significant effect of scenario task load F(2,38) = 14.714, P < .01 (Average workload) and F(2,38) = 13.563, P < .01 (Maximum workload). Post hoc analyses for both revealed a significant increase in reported workload during the high task load compared to the low task load and planning. Results for the maximum reported workload are shown in Fig. 4. There was no main effect of automation scenario or interaction of task load and automation scenario for the subjective workload probe.

Fig. 4.
figure 4

Maximum reported workload for each scenario. Error bars represent standard error of the mean

There were no main effects or interactions for the fatigue portion of the crew status survey.

4 Discussion

The results of the study found no differences between the two automation conditions in either the reported workload and fatigue, or for any of the eye tracking measures. However, both the eye tracking measures, particularly pupil diameter and the subjective workload scale, were able to detect changes in workload across the periods of high and low task load within each scenario.

The pupil diameter results are consistent with those found in a number of other studies in that pupil diameter is shown to increase as mental effort increases [6, 8, 9, 11]. The results are meaningful since they were found in a task with varying levels of luminance across the different regions of the screen. Despite the lack of control for screen luminance, the low cost system was still able to detect a pupillary response to increased task load. This suggests that pupillary response may be a robust measure of mental workload even in visually complex environments.

The authors expected to see significant increases in the Nearest Neighbor Index as workload increased, however this was not the case. The nearest neighbor index did not show any significant differences or even non-significant trends in the data. It is not clear if this is due to problems with data quality, problems with accuracy of the eye trackers, or a lack of sensitivity of the NNI. Although not reported in the paper, the authors adjusted the fixation criteria to allow for both shorter fixation durations and larger dispersion; however using different fixation criteria did not meaningfully alter the outcome of the NNI analysis. To date NNI has only been applied to a limited number of task domains and additional research on NNI needs to be done to see if it is a robust measure of workload.

One of the main limitations of the present study was that there was a large amount of data marked as poor quality by the eye trackers. The authors took a liberal criteria in accepting eye tracking data. The high data loss is most likely due to the participants moving outside of the Gazepoint GP3’s limited head box. This problem was exacerbated by the fact that participants were seated at rolling chairs which could recline.

Overall, despite problems with data quality, the low cost eye trackers, specifically the measures of pupil diameter, demonstrated that they could differentiate between high and low task load in a complex visual task.