Keywords

1 Introduction

Advanced interactive systems entail complex interaction scenarios as well as research challenges demanding the consideration of new factors to design and guide the interaction. Kaipainen et al. [12, 13] have drawn the outlines of a multidisciplinary research agenda focusing on a dynamically coupled human and technological processes. They defined the concept of enactive system, based on a ubiquitous approach [18, 19] of the Bruner’s enactment idea [1].

This approach is recursive by nature, involving the impact of the technology on the human agent as well as the effect of the human experience on the technology. Our investigation expands this concept to a socioenactive vision, which goes further by supporting and identifying how a group of people can dynamically and seamless interact with the technology. The conception and experimentation of such a system presents several open research questions. For instance, how to adequate the design of socioenactive systems in specific domain scenarios.

Our research scenario is an environment of complementary education for children around the age of 5, enrolled in the Division of Early Childhood and Complementary Education of the University of Campinas [3]. In this context, we worked with educational robots, in particular the mBot [4], a robot kit that enables programming via Scratch.

In this paper, we propose a set of socioenactive design guidelines and apply them to a system in the educational environment. We assume that the system’s behavior must be driven and shaped according to users’ input and sense making. For this particular purpose, we adapted Kaipainen’s et al. [12] set of objectives to design enactive systems, establishing a series of guidelines for the socioenactive systems design. Following the guidelines, we mapped 6 human expressions (happiness, sadness, disgust, surprise, anger and despise) to emojis and their respective technological representation in the educational robot.

On this basis, we designed and evaluated a first version of a socioenactive system, in which a series of iterative sessions were performed, consisting of the following steps: a child secretly performs one of the mapped expressions to a camera; the expression is identified and input into the system; the system identifies, for that moment, which action must be executed by the robot; other children hypothesize which expression led the robot to take that action; responses are inserted into the system, influencing the next cycle of interaction.

We analyzed the study’s data relying both on systems’ behavior and the participants’ responses. Our study explains how children created meaning to the performed actions as a group working collaboratively. We found patterns related to the diversity (lack of unanimity) on the robot’s expressions identified by the children; a clear preference for an expression (happiness) and that children have better performance when identifying the robot’s expression than the expression performed by the child in the cardboard box.

Although recent literature has presented alternatives to tailor system’s results according to students’ performance [9] and improved Educational Robotics (ER) [16], we advance the state-of-the-art in the design and evaluation of systems with dynamic interactive coupling between people interaction and the systems’ behavior.

The remainder of this article is organized as follows: Sect. 2 presents the foundations and related work. Section 3 thoroughly describes the defined methodology by presenting the experimental design, the participants and the studied application scenario. Section 4 presents the results and Sect. 5 discusses the obtained findings. Finally, Sect. 6 presents the conclusion remarks and our envisioned future work.

2 Background

Kaipainen et al. [12] defines as enactive a system that “is recursive by nature, involving the impact of the technology on the human agent as well as the effect of the human experience on the technology”. The following research questions were considered to lead the design of enactive systems [12]:

  • What if the interaction experience would modify the content, thus constituting a self-controlling system?

  • What would be the proper metadata ontologies to account not only for pre-existent content categories, but for those that can emerge in such a recursive system dynamics?

Kaipainen et al. [12] further defined several objectives that could be used to lead the development of such systems. In this investigation, we organized the objectives into guidelines (named here from G1 to G4) as follows:

  • G1: definition of a database or rule set to support the generation of behavior in real time;

  • G2: definition of technologies supported by sensors to detect and track participants behavior;

  • G3: mapping between psycho-physiological dimensions of content;

  • G4: an algorithm to manage the narrative montage in real time.

We propose to include social aspects into the model to expand the original enactive concept defined by Kaipainen et al. [12]. In this sense, our proposal considers not only the individual—traditional key in the interaction process with technology—but the impact of the social interactions performed by a group of individuals in such environment. We defined this concept as “socioenactive”Footnote 1.

A key research challenge to achieve socioenactive systems refers to the difficulties of capturing, modeling, and interpreting human aspects such as emotion and social environment. The use of ontologies stands for an alternative to achieve this goal, once they represent semantics in computational systems, by describing concepts and interrelationships among them.

To represent semantics in computational systems, Web Ontologies have been designed to provide rich machine-decidable semantic representation [6,7,8, 10]. They refer to a formal specification of a domain, formalizing a conceptualization of a domain in terms of classes, properties and relationships between classes.

Soft Ontology is another conceptual approach, in which, in contrast to Web Ontologies, with fixed hierarchies described in Web Ontology Language (OWL) [14], refers to flexible set of meta-data [12, 13]. This is useful to represent dynamically evolving information domains, as well as for representing and interpreting psycho-physiological states by including, for instance, the emotions [15] from the involved participants in the interaction. These ontologies present individual elements associated with values in a non-structured a priori hierarchy. They should evolve according to the recursive cycle, thus impacting the human agent and being affected by the human experiences.

With the objective of implementing these flexible solutions, ontology-based enactive systems are frequently based on fuzzy models [17]. Other studies propose the use of ontology networks, which conciliate models of several types of ontological representations, including soft and hard ontologies [5].

In this work, we rely on the concept of Soft Ontology to develop a behavior matrix representing the meaning of robot’s actions. We assume that the robot’s behavior is based on the children’ collective assignments by round to round from the interaction in the environment.

3 Methodology

This investigation aims to answer the following research questions:

  • RQ1: how to adapt the enactive guidelines proposed by Kaipainen et al. to the design of socioenactive system?

  • RQ2: what would be the first impressions of the execution of a socioenactive system in an educational environment?

In the following, we present the experimental design of a socioenactive system (Subsect. 3.1), followed by the educational application where the study was conducted and the description of the participants in our study (Subsect. 3.2). Subsection 3.3 presents the workshops environment and dynamics. Subsection 3.4 presents how data collected from the study was analyzed.

3.1 Experimental Design

The experimental research design was organized to adapt the enactive systems’ goals proposed by Kaipainen et al. [12] as guidelines to support the design of socioenactive systems. Table 1 presents the 4 proposed key guidelines (G1, G2, G3 and G4) underlying our experimental design.

Table 1. Guidelines for the socioenactive design. Adapted from the guidelines proposed by Kaipainen et al. [12] to support the design of enactive systems.

Figure 1 shows an adaptation of an enactive system’ scheme [12, 17] towards a socioenactive system’s organization explored in this study. It shows the socioenactive feedback cycle, starting with G1, used to support the implementation of the socioenactive system instance (cf. Subsect. 3.2). The guidelines were mapped to a behavior matrix (G1), the mBot [4] (G2 and G4) and the mapping of human facial expressions (G3). The social component is represented by the children themselves.

Fig. 1.
figure 1

Adapted from the enactive system proposed by Tikka et al. [17] and organized by Kaipainen et al. [12].

Socioenactive feedback cycle mapped to the proposed guidelines.

3.2 Study Scenario and Participants

In total, 25 children, aged 4 to 5 years old, participated in this study. All children were enrolled in the Division of Early Childhood and Complementary Education of the University of Campinas [3]. The children came from two separated classes, morning and afternoon—referenced from now on as Group 1 and Group 2—with respectively 13 and 12 students each. Each group had a different teacher.

In this study, all parents signed a Term of ConsentFootnote 2, allowing the participation of children, and data collection through video and images. All children assented to participate and signed the Term of Agreement with the help of teachers.

Initially, in a brainstorming session with teachers, we defined 6 expressions that would make sense for the children’ related context: happiness, sadness, disgust, surprise, anger and despise. Each expression was associated to an emojiFootnote 3 expression (cf. Fig. 2).

Fig. 2.
figure 2

Representation of the emoji expressions used in this study, portraying happiness, sadness, disgust, surprise, anger, despise. The emojis were retrieved from the public list of Unicode 11.0 emoji characters [11]. Although very similar, the original images used in this study were not reproduced here as they were retrieved from several internet websites.

In order to contextualize each emojis expression to children, the teachers mapped parts of an adapted version of the “Little Red Riding Hood” story to each of the emojis. The teachers then organized storytelling sessions with the children, showing to them the respective emoji plaque when required (associated to specific parts of the story). For example, a plaque with the “surprise” emoji was shown to the children in the scene that the wolf revealed his disguise to the Little Red Riding Hood. Table 2 illustrates how some parts of the story were mapped to the emojis:

Table 2. Example of how some parts of the adapted version of the “Little Red Riding Hood” were mapped to the emoji expressions.
Fig. 3.
figure 3

Expressions mapping: emoji; mBot software (how the emoji was programmed in the mBot) and how the equivalent emoji was shown in the mBot’s display.

Related to the guideline G3 (cf. Table 1), Fig. 3 shows the expressions mapping, in which each emoji (first row) was mapped as a segment display image, programmed in the mBot software (second row). The third row shows the mBot display mapped to each one of the emojis.

Related the guideline G1 (cf. Table 1), our behavior matrix represents domain concepts and ontology dimensions. This matrix is based on the idea of ontological dimensions (ontodimensions) and ontospaces as proposed by Kaipainen et al. [12]. Their proposal focused on the representation of collaborative tagging practices, with the tags representing ontodimensions and the tag space, ontospaces.

The behavior matrix is our ontological solution to represent knowledge about the emotional expressions and a set of behaviors that can be performed by the robot. Table 3 shows our solution exploring a probability matrix to represent the association of the robots’ actions (rows) and emotional expressions (columns). Each cell represents a weighted probability value of the robot to execute an action for a given emotional expression. The matrix provides flexible and fuzziness behavior to our robot solution. New actions can be dynamically included by inserting new rows, and new emotional expressions can be inserted including new columns in runtime.

Table 3. Behavior matrix (initial state). Cells represent weighted probability values of the robot to execute an action (rows) for a given expression performed by a child.

Here, the concepts refer to the defined emotional expressions and the ontology dimensions to the robot’s actions. Each matrix element stands for a probability of relating the concept with an ontology dimension. All values of the matrix were initialized with an initial default probability. Along the execution of the workshop dynamics (cf. Subsect. 3.3) and the input answers from children, the probabilities are adjusted representing the children’ understanding of the correlation between concepts and ontology dimensions. This dynamic behavior of the matrix based on participants’ input can be seen as a socioenactive system because the association of the robot’s actions with the emotional expression are modeled according to the social interaction context and people’s contribution.

Related to the guideline G4 (cf. Table 1), the mBot was programmed to perform a set of actions for each one of the expressions. The set of actions for each expression was designed with the help of teachers aiming to give a realistic emotional aspect to the mBot. Table 4 shows the algorithm related to each action programmed in the mBot. Guideline G2 was not adopted on this study (i.e., the robot’s sensors were not employed).

Table 4. Algorithm (actions to be executed) programmed in the mBot related to each emoji expression.

3.3 Workshops Environment and Dynamics

We conducted workshops for evaluating of our socioenactive system. The study environment was composed by a cardboard box, presented to the children as a “telepathic box”, equipped with a camera and isolated from the other parts; a stage for the robot to perform its actions; a children audience area; and a table for the children to choose (vote) which expression they thought their friend made inside the “telepathic box”. Figure 4 shows the workshop environment organization.

Fig. 4.
figure 4

Workshop environment. Part (a) relates to the “telepathic box”, a cardboard box where in each iteration a child draw and performed an expression; part (b) relates to the children in audience; part (c) relates to the table in which the children, each one on his/her turn, chose which expression (identified through RFID emojis) they thought his/her friend made inside the “telepathic box”; part (d) relates to the stage in which the robot performed its actions; part (e) relates to the researchers who watched the study; part (f) refers to the camera that recorded the entire study; and part (g) relates to the notebook in which the robot was connected.

The workshop dynamics was organized in the following steps:

  • Step 1: Each child is randomly selected to mimic an emotional expression in the “telepathic box”. Overall, each child is selected once.

  • Step 2: The selected child choose a plaque and mimic an emotional expression in front of a camera.

  • Step 3: The system triggers an action in the robot based on the recognition of the expression performed by the child. Considering the difficulties related to real-time image processing and facial recognition, we adopted a Wizard of Oz approach [2]: after the child in the “telepathic box” mimics the expression, a researcher signals to the other researchers which expression was performed.

  • Step 4: Children in the audience area watch the robot performing the action.

  • Step 5: Teachers ask the children in the audience area: “What expression do you think your friend made in the telepathic box that triggered this action on the robot?”. Each child selects, privately and in his/her own turn, an emoji expression from a pool with the 6 available expressions. Each emoji was internally identified with an RFID.

  • Step 6: The data collected in the step 5 is used to update the system’s behavior matrix.

Figure 5 shows the “telepathic box” (left) and a child selecting an emoji expression (equipped with a RFID tag) as they thought his/her friend performed inside the “telepathic box”.

Fig. 5.
figure 5

On the left, the “telepathic box”; on the right, a child selecting an emoji expression.

The behavior matrix was initially configured to select aleatory actions (Step 3) for a given emotional expression (all ontological dimensions present an equal probability). In such a case, it is initialized with default values (cf. Table 3), which is adjusted according to the children’s behavior. The fuzziness behavior is provided by the weighted random selection of the robot’ actions. For this purpose, we used weighted values associating actions to emotion expressions (i.e., the values of a column). The weighted value is adjusted according to the children’ feedback during the Step 6 in the workshop. This complete a cycle, where the robot impacts the children and is affected by their experiences.

As the rounds proceed, our defined algorithm balances the probabilities among the robot’s actions according to audience feedback (Steps 5 and 6). For instance, given an “action x” performed by a child in the “telepathic box”, if most of the children selected a “happy emotion” to such action, the system would increase the probability of association between “happy emotion” and “action x”. In this sense, the robot’s behavior is based on the children’ collective feedback, as it relies on the assignments received in each round.

As mentioned earlier, we carried out studies with two groups, Group 1 and Group 2, respectively, related to the morning and afternoon classes. The study with Group 1 was used to calibrate the behavior matrix, test it, and correct errors. The matrix was really put into practice only during the study with Group 2.

3.4 Employed Analyses

All workshop sections were filmed and produced data recorded for analyses. We emphasize two distinct analyses. The first concerns the behavior matrix data. For this purpose, we stored the matrix status for each iteration related to the children’s answers to understand the consistency and convergence of concepts related to the ontology dimensions (Subsect. 4.1 presents the obtained results). The second analysis concerns the children’s behavior data. At this analysis, we aimed to comprehend the different expressions assigned by children over the different iterations in the workshops. For this purpose, we counted each participant answer in each iteration in the workshop (Subsect. 4.2 presents the obtained results).

4 Evaluation Results

Section 4.1 presents an analysis of the behavior matrix data. Section 4.2 focuses on the children’s behavior during Group 2 study, once this group used a stable version of the behavior matrix.

4.1 Behavior Matrix Analysis

Table 5 presents the status of the matrix after the first iteration (with Group 2). For example, in Table 5 there is a lower probability of the robot to execute action 1 in response to a child that performed the disgust expression in the “telepathic box”.

Table 5. Behavior matrix after first iteration. Cells represent weighted probability values of the robot to execute an action (rows) for a given expression performed by a child in the “telepathic box” (columns). \(^{*i}\) cell’s value was changed by iteration i

Table 6 presents the matrix values after four iteration rounds. In the first iteration a child expressed disgust in the “telepathic box”, leading the robot to execute action 1, related to happiness, as presented in Table 4. Then, in average, the other children chose, through the RFID emojis, a different expression of the one performed by the child in the “telepathic box”. Thus, the weight for action 1 was decreased, i.e., it was not a good action to be associated to the disgust expression for this group of children.

Table 6. Behavior matrix after the fourth iteration. Cells represent weighted probability values of the robot to execute an action (rows) for a given expression performed by the child in the “telepathic box” (columns). \(^{*i}\) cell’s value was changed by iteration i

In the second iteration, a child expressed happiness in the “telepathic box”, leading the robot to execute action 3, related to disgust. Similarly to the first iteration, in average the other children chose, through the RFID emojis, a different expression of the one performed by the child in the “telepathic box”. Thus, the weight for action 3 was decreased in the happiness column of the matrix.

In the third iteration, a child expressed anger in the “telepathic box”, leading the robot to execute action 5, related to anger. Then, in average, the other children chose, through the RFID emojis, the same expression of the one performed by the child in the “telepathic box”, i.e., the anger expression. Therefore, the behavior matrix had increased the value related to action 5 in the anger column, also decreasing the values of the other actions in the same column, once there was a correspondence between the expression performed by the child in the “telepathic box” and the expression chosen by the other children.

In the fourth iteration a child expressed happiness in the “telepathic box”, leading the robot to execute action 4, related to anger. Then, in average, the other children chose, through the RFID emojis, a different expression of the one performed by the child in the “telepathic box”. Thus, the weight for action 4 was decreased in the happiness column.

After 12 iterations, the behavior matrix presented a slow but consistent convergence, by attributing higher weights to the actions that represent the emotional expressions, as planned by the researchers. It is important to mention that this (12) is still a low number of iterations for convergence purpose. Scenarios with larger number of actions, including various action alternatives for a given emotion expression, are necessary to a more precise evaluation of the behavior matrix.

4.2 Children’s Behavior Analysis

In the Group 2 study, 12 children were present, thus leading to 12 iteration rounds. For each iteration, 11 children should choose what facial expression they thought the child in the “telepathic box” had performed. They used RFID emojis to indicate their choices (cf. Sect. 3.3).

Figure 6 shows the number of different emoji expressions chosen by the children in Group 2 in each of the 12 iterations. It is possible to observe that none of the iterations had an unanimity in the expression’s choice. Iterations 1 and 8 presented the lowest diversity in the choices with 3 different expressions chosen.

Fig. 6.
figure 6

Number of different expressions chosen by the children in Group 2 in each iteration. For example, in iteration 8 a total of 3 different expressions were chosen by at least one child through the RFID emoji. On its turn, in iteration 6 all possible expressions were chosen.

Figure 7 presents (in blue) the total number of choices related to each expression, considering all iterations. The max number (132 choices), was calculated multiplying the number of iterations (12) by the number of choices in each iteration (11). It also presents (in orange), the total number of iterations in which each expression was chosen by at least one child.

Fig. 7.
figure 7

In blue, the total number, considering all iterations, that each expression was chosen by the children in Group 2 (max = 132). In orange, the number of iterations in which that expression was chosen by at least one child (max = 12). (Color figure online)

On its turn, Fig. 8 shows the frequency of children’s choices related to each expression in each iteration. For each iteration, the expression drawn and performed by the child in the “telepathic box” is indicated in a black label, whereas the action performed by the robot is indicated in a blue label. For example, the disgust expression was performed by the child in the “telepathic box” in iterations 1 and 9, and the robot did not execute this action in any iteration. Also, in iterations 1 and 9, respectively 3 and 2 children chose RFIDs emojis related to the disgust expression, meaning they believed the child in the “telepathic box” had performed the disgust expression.

Fig. 8.
figure 8

For each iteration (1 to 12), the horizontal bars represent the number of participants whom chose each expression. The left labels indicate the expression drawn for each iteration (in black) and the action executed by the robot (in red). For example, in iteration 1: the expression performed by the child in the “telepathic box” was disgust; the action executed by the robot was happiness; 5 children chose happiness and; 3 children chose disgust and other 3 chose surprise. (Color figure online)

Additionally, also through the analysis of Fig. 8, it is possible to infer that the expression with the highest number of votes was equivalent to the one performed by the child in the “telepathic box” in 25% (3 of 12) of the cases (specifically iterations 3, 7 and 10). On the other hand, in 75% (9 of 12) of the cases (specifically iterations 1, 2, 3, 4, 7, 8, 9, 10 and 11) the expression with the highest number of votes was equivalent to the one performed by the robot.

Finally, we observe that in three iterations (4, 8 and 11) a consensus was reached, i.e., the majority of children chose the same RFID emoji. In all of these iterations the emoji chosen corresponds to the action performed by the robot.

5 Discussion

Kaipainen et al. [12] proposes, as a contrast to the standard conceptualization of human-computer interaction, an enactive relationship between the individual and technology. On that approach, an enactive system would consider the impact of the technology on the human agent as well as the effect of the human experience on the technology [12]. This relates to the ubiquitous computing approach proposed by Weiser [18, 19], which predicts a seamless interaction with technology, that adapts itself accordingly to the environment characteristics.

Pushing forward the state-of-the-art, this investigation aimed at understanding how a group of people can dynamically and seamlessly interact with technology underlying socioenactive systems enriched by ontology aspects regarding emotional expressions (research question RQ1). For this purpose, our research scenario involved an educational environment with two different groups (Group 1 and Group 2) of 4–5 years old children (N = 25 in total) participating in proposed activities supported by an educational robot. Our robot was programmed to perform a set of actions mimicking some emotional human expressions: happiness, sadness, disgust, surprise, anger and despise.

In the carried out workshops, a series of iterative sessions were conducted, consisting of the following steps: a child secretly performs one of the mapped emotional expressions to a camera; the expression is identified and is input into the system; the system identifies, for that moment, which action must be executed by the robot; other children hypothesize which expression led the robot to take that action; responses are inserted into the system, starting another iterative cycle.

In summary, answering the research question RQ2, we observed in Group 2 some patterns in the actions and behavior performed by the children as our key findings:

  • Diversity of choices for the emotion expression done by the robot: We observed a lack of unanimity in the interpretation of the expression made by the robot. For instance, the group had 4 and 5 different emotion expressions chosen in 8 iterations (out of 12). This result could be attributed in part to the difficulty of the task, considering emotion and its expression through interpretation of the behavior of a robot is not trivial, especially for children of that age. Moreover, there was the complexity of the ontology algorithm in getting input from the audience to rebuild the robot’s behavior in the iterative feedback cycle, which could have made the emotional situation even harder to grasp.

  • Happiness is the king/queen! The choice of happiness as a response for interpreting the robot’s action (or the expression supposedly made by the child in the telepathic box to be reproduced by the robot) was present in 11 of the 12 iterations. This could be interpreted in part as a reflex of the pleasure and excitement children were experiencing in the activity, with the robot’s actions in the narrative scenario presented to them. It seems that children tended to see happiness more frequently than the other emotions.

  • Capturing the robot’s emotion expression: A relevant result relative to the interpretation of the robot emotion was observed. Instead of guessing the emotion the kid in the telepathic box did, children were very good in guessing the expression the robot was expressing. In nine, out of 12 iterations, the expression most chosen by the children was the same actually performed by the robot. This means the action of the robot as a system output was characterizing very well the intended emotion in its behavior.

Overall, several lessons could be learned from this study, that should be addressed in further investigations. Some approaches should be useful in dealing with the many complexities present in our research scenario. The scenario was created for children’s interaction with a robot, who learns with the children’s interpretation of the emotion expressed in the robot’s behavior. This is clearly a socioenactive system scenario, as the enactive and the social aspects of children’s enaction are present.

Although the system enactive loop was very consistently performed as there were actions coming from the audience that fed the system, shaping the next system’s actions, some fine tuning in the algorithm is still needed. The social aspect of interaction in the enactive loop was consistently considered by the ontology algorithm to deal with what that specific group understands as a particular emotion expression. Nevertheless, some adjustments in parameters are still required to cope with the learning aspect of the algorithm.

Besides the ontology system algorithm, other aspects deserve our attention. Working with emotions and their expression, especially in children context might need more granular treatments. For example, reducing the set of emotions could help in making more visible the children’s responses. The joint behavior of children in expressing or interpreting emotions should be another aspect to explore, going further in understanding their ‘choices’ of the emojis which better represent the robot’s action.

This work contributed to understanding some socioenactive aspects of interaction in technology-enhanced scenarios. The lessons learned in this investigation is certainly helpful for informing new scenarios and going further to thoroughly advance the state-of-the-art.

6 Conclusion

The way of designing coupled interactive systems integrating technology and humans in a less deterministic fashion deserves huge research efforts. This paper expanded the enactive concept to a socioenactive vision. Based on the features for enactive systems, we defined a set of guidelines to design socioenactive systems. An instance of such system was implemented in an educational scenario, in which several iterations generated data to shape the behavior of the system according to the meaning given by the children in the conducted workshop dynamics, leading to a non-deterministic behavior of the system. We found several patterns in the children’s actions and behaviors. We consider this study as a first of several efforts in investigating socioenactive systems in practice, shedding light and supporting further development related to this topic. Future work involves further analyses over the user experience in the context of our study. The analysis of such data might provide additional support and evidences to conduct additional studies in the design and development of socioenactive systems.