Keywords

1 Introduction

The use of humor in human communication is perhaps one of the simplest ways to demonstrate a mastery of a given language and a deep understanding of the social and cultural norms within which the humorous communication is taking place. We value displays of humor highly and also often place those who can consistently display wit and conversational humor amongst the most deserving of our social praise and accolades. These factors alone should make humor, and the laughter that often accompanies it, a prime focus for scientific study—unfortunately, they are all too often peripheral within social, cognitive and communication science. In the late 20th century and early 21st century great strides were made in the advancement of natural language processing as a computational endeavour and in the understanding of human language from a computational perspective. However, certain aspects of the intricate use of human language have proven particularly obstinate and resistant to computational modelling. Particularly difficult aspects to model include the understanding of sarcasm, irony, metaphor, and humor. Deft use of these aspects of human language show that a communicator understands the nature of the environment in which they find themselves at a deep level—this importantly often involves the incorporation of current contextual information, an awareness of currently salient topics and an awareness of what is likely to be of interest in the minds of the audience to which the humor is oriented. Knowing what is currently relevant and interesting is one of the key challenges in both computational natural language processing and computational modelling of humor; typical current solutions use abstract or out of date toy problem data sets that are unlikely to be current and relevant [1].

McKeown [1, 2] has argued that the use of humor and the laughter that relates to humor use are best thought of as mutual displays of the fact that one knows what is happening in the mind of interlocutors. Humor displays mind-reading ability to conversational partners and this creates a desire to socially bond with the humorous person and laughter is, at its core, a social bonding signal.

2 The Shannon and Weaver Model and the Conduit Metaphor

A crucial factor in the computational modelling of humor are the assumptions made concerning the nature of human communication. Usually, a fundamental assumption is to base a conceptualisation of human communication on the commonly used Shannon and Weaver model [3] and some version of what has been termed “the conduit metaphor” [4] or the code model as it is sometimes known [5]. These approaches tend to view communication in general, with human communication as a special case, as a system designed to pass information from one entity or person to another—the main goal of the communication process being the efficient transmission or exchange of information. A schematic diagram of the classic Shannon and Weaver model is displayed in Fig. 1.

Fig. 1.
figure 1

Shannon and Weaver model of communication [3].

In this account—which stems from the mathematical and electronics disciplines from which Shannon and Weaver came—an information source creates a message which is then encoded in some way to create a signal. That signal is sent through some communication channel with the possibility of become contaminated by noise. Once in the channel it can be picked up by receivers who may be equipped with the knowledge, motivation and wherewithal to decode the signal and therefore receive the message and the information it contains. This model works very well in the world of electronic signalling and telecommunications—it therefore has had much appeal in the related disciplines of information technology and computer science. However, it has also been widely been adopted in other areas of science where its basic assumptions may not be quite so tenable; it is often taken as a basis for reasoning and thought concerning animal communication and human language and communication. Figure 2 displays a version of the Shannon and Weaver model as it is often adapted and applied to human communication circumstances.

Fig. 2.
figure 2

The Shannon and Weaver model applied to human communication (adapted from Sperber and Wilson, 1995 [5]).

Reddy [4] noted and challenged a similar aspect of the way we think about human communication that is related to the Shannon and Weaver [3] understanding of communication. Reddy suggested that the way that we conceptualise and think about human communication is built upon a strongly held and deeply pervasive metaphor that he termed the conduit metaphor. The central idea is that we talk about and use a narrative about human communication based on a metaphor that suggests we pass information to one another packaged in containers of some form through some general conduit. When we think about how we communicate we use some intuitive form of the Shannon-Weaver formulation moving information from one mind to another; when a container filled with information reaches a receiver they then take it out of the container using some sort of decoding mechanism or linguistic code whereupon they become the possessors of the transmitted information. The english language is replete with uses of the metaphor, some of the examples used by Reddy are: “try to get your thoughts across better,” “None of Mary’s feelings came through to me with any clarity,” “you still haven’t given me any idea of what you mean,” “you’ve put each concept into words very carefully,” and “trying to pack more thoughts into fewer words”. The pervasive nature of this metaphor means it is difficult to escape and retain clarity in communication—and no attempt will be made to do so within this paper. However, it forces us to think about communication in a certain way and constrains us to a certain frame that may not be helpful when we think about the nature of both animal and human communication. It is therefore worth highlighting and making explicit these commonly held underlying assumptions before exploring the nature of human communication, especially in situations in which we seek to create computational models of the necessary processes. As modellers we are too often guilty of accepting assumptions without question when creating models, yet our computational models are completely unaware of these assumptions. Indeed, it is often one of the most useful parts of the computational modelling endeavour that it exposes and makes us aware of assumptions that were difficult to see.

There are many reasons to be doubtful that the Shannon-Weaver formulation or the conduit metaphor that we intuitively use are useful ways to conceptualise both animal and human communication. One of the principle reasons is that many linguistic utterances are highly underdetermined and rely heavily on shared contextual knowledge to ensure correct interpretation. There is much mind-reading that goes on around the process of human communication and one of the most influential theories that explains and clarifies these processes is Relevance theory [5].

3 Relevance Theory and Underdetermined Language

Relevance Theory [5] provides an account for how underdetermined signals function in humans communication. The core realisation is one common to much of linguistic pragmatics—although it is not often stated in these terms—the realisation that there is a lot of mind-reading involved in which a communicator makes an assessment of the knowledge that is already available to the receiver through context and other streams of evidence. This mind-reading is, of course, the scientific view of mind-reading [6], related to perspective-taking, mentalizing, and theory of mind and not the theatrical or telepathic version. Informally, an underdetermined amount of encoded linguistic information is provided in an utterance, after a mind-reading based assessment of the knowledge available to a receiver in a given context. The pre-assessment of the knowledge that is available from the current context in which the communication is taking place—this occurs almost entirely at an unconscious level—allows the communicator to craft an utterance that only contains a minimal amount of information, as much as is required to infer the communicators intended meaning. In other words, the utterance contains only sufficient information that when it is combined with the contextual knowledge available to the receiver, and with any extra non-verbal communication, that all the pieces of evidence taken together produce an inferential interpretation of the meaning communicator is seeking to impart. Rather than it being a matter of encoding the entirety of a message and placing it in a container to be decoded, a careful mind-reading assessment allows the minimal amount of information—the underdetermined linguistic piece of evidence—to be combined with contextual knowledge and non-verbal communication and then assembled within the mind of the receiver to infer intended meaning. As an example, if I were to provide the utterance “John is a soldier,” a receiver would take a very different understanding of the utterance if it was provided in the context of a military base or within a school. A further contextual qualification might occur in the school situation if we were in the school play ground where the child may be playing a war-like game or in the presence of the school nurse where the it may mean that John was a child that was sick or had hurt himself and was displaying some stoical qualities while receiving treatment. The linguistic utterance does not change but is interpreted in very different ways dependent on the context in which it is provided. This has important implications for human communication and the cultural knowledge and context that is available and utilised in human communication. This tacit use of background knowledge is not accounted for in the Shannon-Weaver models or at best it is only implied in the encoding and decoding aspects of the model in ways that hide its importance.

4 Cognitive Effect and the Ostensive-Inferential Model

A basic principle of Relevance theory is that an input—an utterance or some information from some external source—is relevant if it has a positive cognitive effect. Wilson and Sperber [7] give the example of a “true conclusion” having a positive cognitive effect as it improves the user’s representation of the world. However, according the Analogical Peacock Hypothesis [2], the truth, or verifiable mind-world mappings, are not a core or necessary part of human communication. These mappings between representations and an objective reality have value in problem solving situations and certainly improve representations of the world, yet much of human communication can function without problem without any need for verifiability against evidence from a sensed reality; this becomes obvious when we consider story-telling, myths, legends and religion. What is really important in human communication is information that has some social currency—information that is interesting to others within a chosen social group rather than whether it is true or not. Fantastical or sensational information may have value or positive cognitive effects for an individual irrespective of its veracity. Within Relevance theory, general positive cognitive effects are more likely with relevant material, and, importantly for the current argument, relevant material is also thought to be inversely related to the processing effort required by an input. That is, if an utterance requires greater use of perception, memory or inference it is less likely to be relevant. This core argument of this paper is that in situations of humor this may not be the case. The Relevance theory ideas are based on an assumption that evolutionary selection pressures have led to an inexorable drive towards efficiency creating the “Cognitive Principle of Relevance” that “human cognition tends to be geared to the maximisation of relevance”. However, according to the Analogical Peacock Hypothesis much of human communication has developed through sexual selection and entails costly signalling where the need for straightforward efficiency is less obvious and indeed sometimes it could be counterproductive [8]. This means that in situations of creative display as in humor production the goal of being efficient in terms of communication is not necessarily desirable and the cognitive effect of a communication is oriented towards displaying mind-reading abilities rather than the efficient transfer or exchange of information.

Another important aspect of human communication comes from the communicators intentions. A communicator has two kinds of intention when they seek to tell someone something. First, they must first grab someones attention and make them aware that they are about to communicate something and second, they must communicate it. This is part of what is known as the ostensive inferential model in Relevance theory. There is both an intention to inform—the informative intention—and an intention to inform that they wish to communicate something—the communicative intention. The communicative intention is signalled to a receiver by grabbing attention through some sort of ostensive signal [5], this is a signalling of signalhood [9]—it is achieved in many ways, perhaps with a wave or a movement that is slightly incongruous for a situation. Once a receiver becomes aware of an ostensive stimulus or signal then they have knowledge of a communicator’s informative intention and this in turn raises expectations of relevance in the utterance that is likely to follow from the communicator.

5 Cognitive Effort and Optimal Relevance

The ostensive-inferential model in combination with the Cognitive Principle of Relevance leads to another general principle, the Communicative Principle of Relevance. This principle suggests that when a message sender makes an ostensive signal indicating that they have an intention to communicate something, they also make the assumption that the receiver will wish to be communicated to. Consequently, they are presuming that the receiver will be interested in the information that they have to say or convey. Therefore, the sender has an expectation that the receiver will engage in the cognitive effort required to infer the meaning the sender wishes to convey; as a result, the receiver will have an expectation that the sender will be maximising the relevance of the utterance—with all the concomitant mind-reading this entails. The Communicative Principle of Relevance states that “every ostensive stimulus conveys a presumption of its own optimal relevance”. The idea of optimal relevance suggests that an intention to communicate provides strong evidence to a receiver that a sender thinks that the information they wish to convey is worth a receiver’s effort in processing it. Similarly, given this expectation is being placed in the mind of the receiver, a sender should ensure that they provide a relevant communication if they wish the receiver to understand their communication and if they wish to ensure that they have a reputation as someone worth communicating with in the future.

6 Display and Alignment in the Analogical Peacock Hypothesis

The Analogical Peacock Hypothesis [2] suggests that there are two fundamental kinds of human communication, display and alignment. Display communications are the most fundamental—they concern the display of the mind-reading abilities of the sender. The motivation to display one’s mind-reading abilities stems from the evolutionary advantages that are gained through rising up through the ranks of a social hierarchy [10]. The Analogical Peacock Hypothesis combines two evolutionary schools of thought, the social brain hypothesis [11,12,13,14], and the use of mental fitness indicators [15, 16], to explain this connection between social status and mind-reading display. As our social groups became larger and more sophisticated, those who were better at climbing their social hierarchy would have access to better resources and mating possibilities due to their elevated status; amongst the most important skills required for climbing social hierarchies are the socio-political skills of perspective-taking, mind-reading and theory of mind. The Analogical Peacock Hypothesis argues that at some stage in human evolution these skills became sexually selected, starting an arms race that required ever more intricate ways of displaying these abilities. A first stage involved non-verbal and emotional display, showing knowledge of what was in another individual’s mind and their desires through processes of empathy, cooperation, kindness, and gift giving. An evolutionary arms race—typical of sexual selection—amongst those competing to display these skills would require ever more intricate ways to display and ever more discerning abilities to differentiate between those who are skilled displayers. Given enough time the limits of any given signal as a means to differentiate mind-reading and attractiveness in general are likely to be reached. At this point, specific signals are likely to become thresholds that must be reached to retain attractiveness as a potential partner or ally; beyond these thresholds further more intricate means of display would be required in order to discern between high level mind-reading abilities. This would lead to a multi-modal signalling system in which various streams of non-verbal information contributed to the overall signal.

With the arrival of symbolic communication, verbal and analogical styles of display became an option for displaying mind-reading abilities. The original Analogical Peacock Hypothesis paper [2] highlighted the importance of verbal proficiency, intelligence, creativity and humor in this respect; Miller [16] gave a long list of potential mental fitness indicators including culture, music, art and creativity, language in conversation and storytelling, humor (both verbal and nonverbal), and morality such as kindness, honesty, humility, and gift-giving. Many of these indicators are amongst the most sought after qualities in potential mates, commonly found in cross-cultural mate preference studies [17, 18]. From the Analogical Peacock Hypothesis point of view one of the most useful tools in signalling mind-reading ability is the creative combination of concepts. Being able to combine concepts in a way that others have not yet thought about displays a strong knowledge of the contents of their mind and shows that one has very strong mind-reading abilities. This creative combination of concepts is a highly typical component of linguistic humor [19, 20]. A potential origin for the creative combination of concepts is in human linguistic gossip—a key element of Dunbar’s take on the Social Brain Hypothesis [12] and important in the evolution of language [21]. The presentation of information of interest about two members of a social hierarchy involved in some romantic but compromising tryst, for example, provides a salacious or perhaps taboo piece of knowledge that is likely to be of interest, a novel creative combination of sorts. Such news is likely to have been met with surprise, and concomitant jaw-dropping style facial responses, as they are today. More abstract combinations of information and juxtaposition may have arisen through storytelling processes based on fictional gossip style tales leading to humorous combinations. The use of novel salacious gossip is likely to be a form of display of mind-reading. It shows that a sender is aware of what is likely to be interesting to another individual within their shared social hierarchy, and that they have a strong knowledge of what is taboo and permissible behaviour within their culture. This kind of disclosure of information and sharing of social currency is likely to result in a desire for social bonding, although it would not be without risk of social sanction—making it a useful discernment tool in mind-reading ability. Gossip is also a crucial factor in the second main kind of human communication—alignment communications.

The second kind of communication comes about as a result of the first. To be able to display mind-reading abilities one must know what is in the mind of other people. In the modern world this means a lifetime dedicating oneself to learning one’s culture and aligning one’s mind with those that you are likely to want to impress. The majority of human communication is probably oriented towards alignment. Most of the receiver aspects of human communication are alignment oriented—with a smaller amount based around judging display communications. The distinctions between these types of communications are not strong or clear and people may revert into one or another quickly as opportunities arise and change throughout the dynamics of conversation and social interaction.

The picture presented so far in Relevance theory with the intertwined nature of cognitive effects and effort and the presumption of optimal relevance fits an alignment view of communication more closely than a display view of communication. However it clearly has very strong aspects in which mind-reading is highly important.

7 Humor as an Ostensive Challenge to Increase Cognitive Effort

The core argument being made in this paper is that in the situation in which a humorous exchange is being made a sender is seeking to display their mind-reading ability, and this leads to a special circumstance that is not accounted for in standard Relevance theory. The need to display mind-reading ability means that a sender is doing something more than making a standard underdetermined utterance, in Relevance theoretic terms, or an alignment communication in Analogical Peacock Hypothesis terms. The move to display changes the communicative dynamic. This paper argues that a special kind of ostensive stimulus is required, an ostensive challenge, in which the receiver is invited to expend a greater level of cognitive effort than would normally be the case in order to find a not so obviously relevant connection between two or more concepts. In these situations, in which a display communication is being flagged, the presumption of optimal relevance may give way to a principle of obscure and non-salient relevance—an indication that the sender is aware of a relevant connection between these two or more concepts but that it is not easy or simple to make the connection. It then becomes a challenge to the receiver to try and find the connection without being given a clue or punchline by the sender. Upon failure to rise to the challenge the sender may then choose to provide the answer.

There are many kinds of joke or humorous situation that can create an ostensive challenge, however, there are some obvious candidates—some of the most apparent are the earliest joke set-ups that we learn, for instance, the two words “Knock knock...” are famously the start of a joke telling formula. Other highly formulaic examples would be the “I say, I say, I say...” of music hall or vaudeville comedians or a more modern “A [insert joke element] walks into a bar...”. There are many smaller and more subtle cues that may be ostensive challenges too, non-verbal facial expressions or changes in tone of voice may serve to highlight the change from an alignment communication to a display communication with an ostensive challenge. These signals of an ostensive challenge highlight the existence of a joke telling or humorous frame but the function remains the same—letting the listener know that they are being challenged to look for an answer that requires an increase in cognitive effort. The challenge being set is that for the concepts offered as a potential combination there exists a relevant conceptual combination—the combination can be found, but finding it will require a greater than typical amount of cognitive effort. If found by the receiver or revealed by the sender there will be a payoff that is worth the extra cognitive effort to find it—a humorous payload.

The ostensive challenge implies that even though the effort is greater the normal level of effort that would be expected in an alignment communication, it will still be worth the cognitive effort to find the connection. If the sender has sufficient mind-reading skills to make a solid inference about the receiver’s taste in humor in relation to the quality of joke, and its suitability given the context and mood—the joke or attempt at humor will be funny and make the receiver laugh.

From an Analogical Peacock Hypothesis point of view, the ostensive challenge informs the receiver that a display communication is taking place; the sender is indicating that two or more concepts exist that can be connected in the mind of the receiver, these concepts are typically presented in the joke setup. It also implies that the sender or joke-teller is aware of the connection but has judged that the receiver is not aware of it. This creates a clear situation in which the sender is letting the receiver know that they are aware of the contents and relational connections between conceptual representations that exist in the receiver’s mind—and in an intricate way. The knowledge of the conceptual connection remains within the control of the sender or joke-teller right up until the point of a reveal in the form of a punchline, unless of course the receiver can rise to the challenge and find the connections for themselves.

There is also an additional layer of mind-reading ability on display, one concerning receiver knowledge of humorous taste. The joke-teller is making an assumption that they can tell or are aware of the humorous taste of the receiver. They are also making assumptions about the interaction of humorous taste with current contextual factors such as the social environment and the receiver’s mood when the joke is told. That is, an assessment is being made as to whether the receiver will be suitably receptive to the attempt at humor that the joke-telling utterance entails. This involves a number of social risks. Jokes are prepackaged and therefore not original displays of wit, as a result there is a greater onus in any judgement on the assessment of a receiver’s receptivity, as the teller usually cannot or does not claim to be a joke’s author. Many jokes are met with groans of disapproval rather than laughter if any of the various factors made in the contextual assessment are poorly judged.

8 Positive Socio-cognitive Effects

Encounters that include an ostensive challenge must still have strong positive cognitive effects; perhaps, they may be better termed as positive socio-cognitive effects. These interactions improve the user’s representation of the social world. They do not necessarily require mind-world mappings. Nothing in the interaction need refer to an external empirical test of an objective reality—although that may of course be included and may improve the humor by making it a harder to fake signal—the inclusion of context and concurrent facts make quick witted mind-reading ability more obvious [1, 2]. However, mind-world mappings are not necessary. The positive socio-cognitive effect comes from the social evidence that the attempt at humor provides. It allows information concerning the pair of interlocutor’s social standing in their present social hierarchy to be adjusted and updated. The sender has taken an opportunity to display, that is, they have seen an opportunity to elevate their social standing with someone with whom they seek to socially bond, and have told them a joke or created a humorous event as part of an affiliative process. These displays are inherently risky, for if they are not performed well and the humor falls flat, the sender will have exposed their desire to socially bond and concurrently displayed an ineffectual ability to adequately mind-read and assess the social mood of the receiver. For the receiver there is a positive socio-cognitive effect in the knowledge that they are viewed as worth the risk of an attempt at an affiliation oriented display. If a receiver provides an honest signal in response, a hard-to-fake genuine laugh, it is a signal that the humor was successful and requires an evaluation or re-evaluation of the relationship—typically this means an elevation of a sender’s social standing with respect to the receiver. A display that is not received as humorous but falls flat may be met with a polite low intensity laugh requiring no re-evaluation of social standing, at least not in an upwards direction. A groan or negative comment may indicate that there is value in the social relationship, but that it comes from other dimensions that are not the humorous one, or that sufficient social capital already exists that a current poor performance will be tolerated. An absence of any response at all is likely to mean that there is very little future in the social relationship. Therefore, although the current argument suggests an addition to Relevance theory based thinking concerning humor—one that accommodates the two kinds of communication suggested by the Analogical Peacock Hypothesis—there still remains the overall requirement for positive cognitive effects that make the communicative interchange and social interaction beneficial for both parties.

9 Implications for Computational Models of Humor

The importance of incorporating humor into computational dialogue models of human computer interaction has been made before [22,23,24], However, due to the intricate nature of human conversational interaction and given the many prevalent and commonly made erroneous assumptions about the nature and function of human communication—some of which are outlined at the start of this paper—there has been little incorporation of humor into dialogues that occur in real-time and that can flow-freely. Most dialogues models are constrained to fixed scenarios or involve tasks such as information provision that are highly functional in nature. Tactics used to incorporate humor in these situations involve the use of canned jokes and self-deprecating humor. This is largely safe ground as it can be pre-prepared and involves little in the way of a need to be aware of contextual knowledge. This approach also requires minimal assessment and knowledge of the minds of an audience or a receiver that any given crafted utterance will be directed towards.

Digitial assistants such as Amazon’s Alexa, Apple’s Siri, Google’s “Google Now” feature and Microsoft’s Cortana are becoming more prevalent in our daily lives through their presence in our mobile devices and increasingly in our households. To make these interactions less monotonous they will require the use of humor and laughter in a more free flowing way [1]. To achieve this in a convincing manner requires a much stronger assessment and incorporation of the conversational context and receiver knowledge on the part of the sender than is currently occurs. This kind of information is currently available in ways that were not previously accessible and incorporation of mind-reading and contextual knowledge should make interactions much more personable and pleasant. More complete and nuanced models of human communicative interaction are required than simply thinking it is sufficient to provide information. Computational models will require knowledge of social and cultural norms, incorporation of context and assessment of the receiver’s goals and desires. They will also need to socially engage at the appropriate level of social distance. This means being very nuanced in the degree to which they target interactions. The use of ostensive challenges will provide a useful tool as these models increase in the level of pervasiveness and intrusion they create in our lives. Providing too much knowledge or mind-reading without the appropriate judgement and etiquette can be perceived as creepy or awkward. Something like an “uncanny valley” of conversational interaction probably exists where too much information can be known about an individual for them to be fully comfortable—an overly informed human gives a “stalker” kind of feeling, a machine will probably feel even worse as it will have implications of the corporate intrusion of privacy. However, the appropriate use of ostensive challenges may permit humor to be tested in ways that will let models learn an individual’s sense of humor, what works for certain people and in what circumstances. It may even become a sought after feature in circumstances when a user is bored or requires some elevation in mood. If we are to make genuinely funny and humorous digital companions that are creative in their humor generation rather than simply regurgitating jokes or humorous memes from social media, then delivering humor through the use of ostensive challenges is likely to minimise the degree to which failure to be funny will be detrimental to an overall relationship with a digital assistant.

There are many ethical issues that must be considered concerning the degree to which we wish to have our digital assistants become charming social interactants and companions. Artificial intelligence is not after all artificial sentience; creating the illusion of artificial sentience is an aspect of artificial intelligence that needs to be considered carefully. Adding humor and increasingly appropriate conversational abilities is likely to increase the levels of anthropomorphism we indulge in these algorithms. These systems should require opt-in choices, and much care needs to be taken in not overstepping the mark. These machines are in many ways the equivalent of ventriloquist’s dummies and while suspending disbelief for a period of time may be entertaining it is probable that some people will assume that there are actual sentient capabilities within these algorithmically driven entities. Therefore, there is an ethical onus on the producers of such machines to ensure that they are perceived in that way and enjoyed as such, rather than becoming so believable that the attribution of sentience is given where no sentience exists. Creating machines with stronger humor abilities makes that burden of responsibility a stronger obligation.