INTRODUCTION
Metonymy is pervasive in communication among both native and non-native speakers when using a lingua franca. Consider the following example: during an online student meeting in an international course on 'Intercultural Communication,' a Russian student explains to his German and Brazilian peers that, in Russia, parental education often emphasizes strict rules and pragmatic values, particularly with regard to financial matters:
Sequence 1: 2024VMIIC07 ((05:11-05:28))2
| 01 | R | ah: in RUssia? |
| 02 | (-) ah: (.) from THE:==↓i don't KNOW. | |
| 03 | from your:- | |
| 04 | (.) when you're a CHILD, | |
| 05 | ah: your parents tell YOU; | |
| 06 | DONT'T receive the credit mOney. | |
| 07 | DONT'T pay in crEdit. | |
| 08 | Pay in CASH (.) Every. | |
| 09 | If you dOn't have CASH, | |
| 10 | (.) DON'T pay. |
To explain the complex topic under discussion—namely, a macro-level approach to cultural dimensions as proposed in cross-cultural psychology—the Russian student provides an example to illustrate his understanding that Russians hold a distinctive attitude toward expenses that may differ from other cultures. However, this interpretation must be inferred from the context, as it is the metonymy that grants access to an idealized cognitive model (ICM). In this case, the model pertains to a particular cultural framework of 'managing money/expenses' (in a narrower sense) or 'navigating everyday life' (in a broader sense). The credit card serves as a quintessential illustration. As a basic-level category, the credit card represents shared common ground among all participants, enabling the Russian student to assume that the other two co-participants recognize its culturally independent connotations. Specifically, the credit card is seen as an unreliable tool for everyday shopping, as it can lead people to lose track of their expenses and encourage spending on credit. Roughly speaking, through the metonymic chain (Barcelona, 2005) [DON'T USE CREDIT CARDS → DON'T TRUST MONEY THAT IS NOT TANGIBLE → DON'T TRUST THINGS THAT ARE NOT REAL → DON'T BE RECKLESS], the student makes a culture-specific frame available for his co-participants.
István Kecskes (2014, 2015) proposed a socio-cognitive approach to studying intercultural pragmatics that does not focus on contrasts but real encounters between nonnative speakers interacting in a lingua franca. His research highlights that lingua franca communication must often navigate the absence of shared common ground among participants, since nonnative speakers cannot fully rely on established norms, standards, conventions, or the schemas and imagery underlying utterances and formulas. As a result, they must engage to a higher degree in the co-construction of common ground within their temporary community of practice. This represents an important step toward the in situ construction of intercultural meaning. However, to date neither cognitive linguistics nor intercultural pragmatics has undertaken a fine-grained empirical analysis of intercultural interaction that would reveal the specific linguistic and embodied resources participants draw on as they co-construct meaning turn by turn.
This article aims to study the role of verbo-gestural metonymy in intercultural interactions from taken from a corpus of videotaped and transcribed conversations. In the first section, literature on metonymy from a cognitive perspective will be reviewed, focusing particularly on its applications in gesture studies. Based on the current state of research, we will identify the key questions arising from these insights before introducing the methodology of the ICMI Research Center, its data collection procedure, transcription methods, as well as the selection criteria for the sequences analyzed. Following this, we will present three examples drawn from the selected data, as well as their analysis. Finally, the article will discuss the results and outline potential questions for further research.
A REVIEW OF METONYMY FROM A COGNITIVE PERSPECTIVE AND GESTURE STUDIES
Cognitive linguistics has expanded and partially redefined the concepts of metonymy and metaphor by addressing them as cognitive processes and tools of conceptualization. From this perspective, metonymy functions as a reference point that provides mental access to an entire domain, with the metonymy highlighting a specific aspect of that domain (Radden; Kövecses, 1999; Croft, 1993: 348), as opposed to metaphor that involves cross-domain mapping. The process of providing mental access to another within the same idealized cognitive model (ICM) reflects not only people's encyclopedic knowledge of a specific domain but also the cultural models embedded within that knowledge. In the statement "The buses are on strike," there is no direct semantic connection between "buses" and "bus drivers." The metonymic shift arises from the pragmatics of the situation and our understanding that both buses and bus drivers belong to the same ICM connected through a control relationship: CONTROLLED FOR CONTROLLER. To comprehend a metonymy, a co-participant must share sufficient common ground to access the scenario or frame established by the metonymy. This requires understanding "the whole structure in which it fits" (Fillmore; 2006, p 373).
Researchers in cognitive linguistics have identified the most common metonymies (for an overview, see, e.g., Soares Da Silva, 2003; Littlemore, 2015) and have highlighted regular patterns, such as the physical basis of emotions undergoing a highly productive and regular metonymic process (Silva, 2003, p. 44). Littlemore (2015) has examined WHOLE AND PART as well as PART AND PART metonymies as the most prominent categories in corpora, finding that PHYSICAL PART FOR WHOLE metonymies are particularly prevalent, particularly those involving body parts, as in the example "The hired hands are here," which frequently carry a strong depersonalizing effect.
Although metonymy has been clearly distinguished from metaphor in cognitive linguistics from the outset, researchers have pointed out that both are intertwined in multifaceted ways. One of the most influential studies on this topic was conducted by Goossens (1990), who identified four types of 'metaphtonymies' based on figurative expressions he analyzed in detail: (a) 'metaphor from metonymy', (b) 'metonymy within metaphor', (c) 'metaphor within metonymy', and (d) 'demetonymization in a metaphorical context', with the first type, 'metaphor from metonymy', being the most prominent, such as "close-lipped," meaning 'secretive,' and "tongue-in-cheek," meaning 'not in earnest.' In both cases, the depicted physical behavior is prototypically associated with the social behavior of a person. Deignan (2005) confirmed the prominence of this category among the various relationships between metonymy and metaphor in real data of large corpora and revealed that the interaction between metonymy and metaphor is a scalar and contextual phenomenon. The number of metaphtonymies in real language use is likely greater than either pure metaphor or pure metonymy, and the boundaries between the two are often fuzzy rather than clearly delineated (for an overview see Schröder, 2021). Barcelona (2011, p. 7) ultimately challenges the standard cognitive-linguistic notion of conceptual metonymy as a one-domain "mapping," "highlighting," or "activation." He aims to move beyond the clear-cut distinction between metonymy and metaphor—traditionally defined by the separation of source and target domains—by distinguishing purely schematic metonymies from more typical cases, such as WHOLE FOR PART metonymies and prototypical referential metonymies, in which the target represents a specific individual entity. According to Barcelona, cognitive domains often have fuzzy boundaries, making it difficult to determine whether source and target domains belong to the same superordinate domain. As such, the interpretation of a given expression must rely on context and background knowledge.
This background knowledge is not solely based on universal cognitive grounding; metonymies also have historical and culture-specific dimensions. These aspects can render them difficult to understand for individuals outside a given linguistic or cultural group. Soares da Silva (2006) provides an insightful overview of the semantic history of the Portuguese verb deixar ('to leave'), a highly frequent and productive verb in everyday usage. Its polysemy encompasses psychological, social, and moral meanings such as 'abandon,' 'permit,' and 'not intervene,' reflecting both metonymic and metaphorical developments. These meanings trace back to embodied and interactional experiences—such as unlocking or releasing—rooted in the Latin etymon laxare, meaning 'to loosen,' 'to relax,' 'to release,' or 'to let go' (Silva, 2006).
[N+N] compounds frequently activate conceptual metonymies or metaphtonymies, which carry culturally entrenched meanings. Onysko (2017), for example, shows that compounds used by speakers of New Zealand English often encapsulate cultural meanings. The term "bucket philosopher," referring to a person who openly talks about death, involves a PART FOR WHOLE metonymy, relying on the idiom 'to kick the bucket.' Similarly, "snail villa," meaning 'a trail of something,' evokes a metonymic chain in which "snail" is interpreted as ANIMAL → MOTION → PATH.
In a similar vein, proverbs may exhibit a rich interplay between metonymy and metaphor, as demonstrated by Siqueira et al. (2017) in their analysis of examples such as Quem vê cara não vê coração ("Who sees the face does not see the heart"). This proverb involves the conceptual metonymy PART FOR WHOLE, the CONTAINER image schema, and the primary metaphor ESSENTIAL IS INTERIOR. Even more complex are multimodal memes and political cartoons (charges), which frequently incorporate metonymic and metaphoric elements in both their source and target domains. As Sperandio (2016) shows in his analysis of Maurício Ricardo's charges, these forms often constitute intricate conceptual blends. Understanding multimodal cognitive discourse resources—across domains such as art, advertising, and politics—has led to a substantial body of semiotically informed cognitive research (e.g., Forceville; Urios-Aparisi, 2009).
Nevertheless, despite this expanding interest, the role of metonymy in gesture remained largely neglected for a significant period. An exception to this trend is the work of Irene Mittelberg, which is explicitly dedicated to the role of gestural metonymy in interaction. For Mittelberg, metonymy plays a fundamental role in gestural thinking and signification, as gestures are "by nature abstract(ed) and partial representations [...] and thus inherently metonymic" (Mittelberg; Hinnell, 2024, p. 133). Gestures metonymically evoke frames by highlighting aspects of basic scenes of experience and are deeply interrelated with processes of pragmatic inferencing, which involve shared knowledge. In the creation of an iconic or metaphoric gesture, prototypical or locally relevant aspects of a particular object or action are emphasized, while other implied aspects must be completed, imagined, or inferred by the interlocutor(s). Gestures, therefore, serve as tools, and it is up to the interlocutor(s) to abstract meaning from them. For example, one might abstract meaning from the motion event executed by the moving hand (Mittelberg; Waugh, 2009, p. 120). Re-enactments, substitutions, adjacent representations, and tracing all represent only parts of some action or entity. The gestural re-enactment of writing by a hand in the air does not involve a writing instrument or paper. However, through metonymy, the gesture evokes a part of the entire writing scene (Cienki, 2013). In this way, metonymies in co-speech gestures are discourse-embedded, as they are linked to discourse-specific frames through the principle of partial semiotic portrayal. Gestures, therefore, "abstract salient characteristics from, briefly allude to, or otherwise evoke entire persons, three-dimensional objects, holistic motion events, and rich contexts" (Mittelberg, 2019, p. 1). This is how interactants highlight those scenarios that become relevant to their communicative goals. These scenarios are metonymically linked by a pragmatically and experientially grounded frame in which a gesture can trigger ensuing associative chains within larger semantic networks.
Over time, gestural metonymies may undergo habitualization, with hand shapes and movement routines evolving through grammaticalization processes in language. Mittelberg and Hinnell (2024) illustrate this process with the example of the 'palm-up open hand' (PUOH) gesture, which is commonly observed in the German intransitive existential construction es gibt ("there is/are"). When this multimodally instantiated construction is used, the PUOH gesture metonymically enacts a reduced and schematic variant of the full action of giving.
Frequently, metonymy marks the first step toward metaphor in verbo-gestural communication. Mittelberg and Waugh (2009) provide an example of a hand gesture that draws a frame in the air, though the frame is never fully drawn—typically, only the edges are indicated. Thus, the traces in the air must be interpreted as representing some form of frame, which corresponds to the initial metonymic step in meaning construction. In the second, metaphoric step, this frame could be interpreted as the frame of a story the speaker is telling, such as the frame of a movie rather than the frame of a picture. In this way, metonymy can pave the way for metaphor in interaction (Mittelberg; Hinnell, 2024).
As we have seen in this subsection, metonymies are frequently universal, however the more complex they become the more culturally bound they may be. It has also been shortly mentioned that Kecskes (2015, p. 180-187) elaborates on the crucial yet merely scalar difference between interactions among native speakers and those among nonnative speakers. In the latter case, intersubjectivity relies much more on language created ad hoc and in situ than on prefabricated language, formulas, idioms, and pre-existing frames. The interactants must build their frames bottom-up while negotiating meaning with their interlocutors. This raises the question of how the dynamics of metonymic meaning construction evolve toward a culturally 'discharged' use of metonymies in language and gesture, in line with a common ground co-constructed in situ.
METHODOLOGICAL PROCEDURE
Combining lingua franca research from an interactional perspective with a focus on cognitive phenomena is still in its infancy (Zima; Brône, 2015; Hall, 2020; Schröder, 2025). However, fine-grained multimodal analysis offers an ideal starting point for accessing both cognitive online processes and entrenched levels of distributed and shared cognition that are (re)activated and (re)negotiated in situ.
The data that will be presented in the Analysis stems from the ICMI corpus, belonging to the ICMI corpus. Established in 2010 at the Federal University of Minas Gerais, the Research Center for Intercultural Communication in Multimodal Interaction (ICMI).3 has grown into an inter-institutional and international network of researchers focused on analyzing intercultural communication in interaction from micro-analytical and multimodal perspectives. The center records interactions between participants with different linguistic and cultural backgrounds, as well as those between participants from the same background, for comparative purposes.4 Since its inception, the center's methodology has been multimodal, based on videotaping and transcribing intercultural interactions occurring in natural, elicited, and institutional settings. After recording, the interactions in the ICMI corpus are transcribed using the EXMARaLDA software program (Schmidt; Wörner, 2009) in accordance with the CA transcription conventions GAT 2 (Selting et al. 2011; see Appendix). Currently, the corpus consists of approximately 4,775 minutes of videotaped interactions.
The three sequences analyzed below were selected from a dataset of twenty sequences drawn from two projects: "(Inter)cultural Key Concepts at the Interface of Interaction, Cognition, and Variation" (2017–2022),5 and "The Multimodal Coordination of Intercultural Video-Mediated Interaction" (2023–2026).6 Both projects compose part of the ICMI corpus, specifically the subcorpora ICMI-InCogVa and ICMI-VMI. The twenty sequences were chosen because they exhibit the sequential and multimodal elaboration and co-construction of complex metonymies that have been identified as cognitively and (inter)culturally salient, thereby warranting further fine-grained transcription, gesture annotation, and analysis as a first step.
Qualitative research on intercultural talk-in-interaction from a multimodal perspective relies on a rigorously empirical procedure. Collecting data in this way is essential for providing a complementary perspective on "cognition in interaction," which remains largely neglected (Zima; Brône, 2015). This is why each new project begins with specific research questions but adopts a bottom-up approach. Although neither videotaping nor transcription can ever be free of bias, this empirical procedure remains, to date, the most rigorous means of initiating an investigation of talk-in-interaction. Transcription itself constitutes a process of analysis: salient co-occurrences—such as the recurrent simultaneous use of a lexical, pragmatic, or syntactic marker in combination with high pitch, marked accentuation, and a particular gesture—may give rise to further analyses of recurrent patterns (e.g., multimodal list constructions). In the present case, such observations spark scientific curiosity about whether metonymy may serve as a resource for establishing common ground and, if so, how it is collaboratively elaborated, maintained, and/or modified in situ, that is to say, turn by turn.
Building on this first phase, the subsequent selection of sequences was guided by the aim of identifying a substantial set of instances in which complex metonymic elaborations contribute to common ground work. The goal was to analyze how interlocutors embed these elaborations in their narrative contexts and to determine their specific functions within the ongoing interaction, thereby shedding light on their role as a resource for common ground building in talk-in-interaction. Finally, from this set, three sequences were chosen for detailed presentation here, as they exemplify different ways in which metonymy is interwoven with or further developed alongside metaphor.
For the purpose of the following analysis, we will first present the video by a link, as well as the GAT 2 transcriptions of the sequence under discussion (see for the conventions Appendix). Following that, the second step of annotation, namely, the 'zoom-in' procedure is given. This procedure is based on the annotation system proposed by Bressem (2013). The adopted categorization relies on the four parameters of sign language: (a) hand shape, (b) orientation, (c) movement, and (d) position in gesture space. Table 1 provides an overview of these gesture form aspects, as well as the acronyms used for the most common hand shapes and orientations:
| 1. HAND SHAPE | 2. HAND ORIENTATION |
|---|---|
| a. OH: open/flat hand | a. PU: palm up |
| b. R/LH: right/left hand | b. PD: palm down |
| c. 2H: two hands | c. PL: palm lateral |
| d. F: fist | d. PV: palm vertical |
| e. Single fingers: i. Combination of fingers: 1+5 or 1-5 (1=thumb, 5=little finger) ii. Shapes: stretched, bent, crooked, flapped down, connected, touching | e. TC/AC: towards/away center f. TB/AB: towards/away body |
| 3. MOVEMENT SHAPE | 4. DIRECTION OF MOVEMENT |
|---|---|
| a. Straight, arced, circular, zigzag, s-line, spiral | a. Horizontal: right and left b. Vertical: up and down c. Sagittal: away from and towards body d. Diagonal: all of the previous directions |
| 5. QUALITY OF MOVEMENT | 6. POSITION |
|---|---|
| a. Size: reduced or enlarged b. Speed: decelerated, accelerated c. Flow of movement: accentuated or not | a. Four sectors: center center, center, periphery, extreme periphery b. Position: upper, lower, right and left |
ANALYSIS
4.1 I lost the point
Sequence 1 is derived from a video recording conducted at a Brazilian university. The participants include a Brazilian student acting as a trainee teacher of English, a German exchange student, and a US-American exchange student.7 The US-American exchange student concludes his turn by emphasizing the importance of consistent practice in attaining fluency in a foreign language, and the other participants express their agreement. Subsequently, the Brazilian participant initiates the next turn and offers his assessment:
Sequence 2: Lost the point 2019BHAmBrGe01 ((07:13-07:36))8
| 01 | Br | but ↑that's imPORtant (-) like (.) |
| prAc[ticing it's gOod; ] | ||
| 02 | Ge | [it's just PRACtice;=right?] |
| 03 | Br | <<tipping with stretched R2 on his forehead> yeah (.) but |
| the: the the other thIng is like (.) AH:M (-) eh:;> | ||
| 04 | <<holding his LH with stretched R2 in front of him, | |
| laughing, p> i LOST the ˊpOint?> | ||
| 05 | <<2H OP TB stretched R2, laughing> i hAd it right ˊHERE-> | |
| 06 | [(.) and i <<snapping fingers, laughing> LOST ˋit.>] | |
| 07 | Am | [((laughs)) ] |
| 08 | Ge | [((laughs)) ] |
| 09 | <<pointing with R2 to the left> yeah but the THING is like-> |
In this sequence, the Brazilian participant attempts to articulate a point about practicing a foreign language but experiences a brief lapse in memory—a phenomenon not uncommon even among native speakers. Notably, he demonstrates an intensive engagement with this moment by inserting an explicit metacomment, elaborating a metonymic chain, reactivating an entrenched idiom and underscored this process even by his dynamic corporal involvement, as shown in Table 2:
| TRANSCRIPT (GAT 2) | GESTURE FORM | METONYMY/METAPHTONYMY | IMAGE | |
|---|---|---|---|---|
| 03 | H-SHA: | RH crooked, index finger stretched | Metonymy: | 2a |
| yeah (.) but the: the the other thIng is like (.) AH:M (-) eh:; | H-ORI: | PL | PLACE FOR OBJECT | |
| S-MOV: | straight | Metaphtonymy: | ||
| D-MOV: | TB | SPACE FOR MEMORY | ||
| G-POS: | upper center | |||
| 04 | H-SHA: | R OH flat | 2b | |
| i LOST the ˊpOint? | H-ORI: | TB | ||
| S-MOV: | straight | |||
| G-POS: | upper center | |||
| 05 | H-SHA: | 2H crooked, 1+2 stretched | Metonymy: | 2c |
| i hAd it right ˊHERE- | H-ORI: | arced | PLACE FOR OBJECT | |
| S-MOV: | towards body | Metaphtonymy: | ||
| Q-MOV: | enlarged | SPACE FOR MEMORY | ||
| G-POS: | center | |||
| 06 | H-SHA: | 2H crooked, 1+2 crossed | Metonymy: | 2d |
| (.) and i LOST ˋit. | H-ORI: | PV, TB | CLICK FOR LOSING ACTION | |
| S-MOV: | arced | Metaphtonymy: | ||
| G-POS: | center | FORGETTING IS DISAPPEARING | ||
While searching for the momentarily forgotten idea, the participant points towards his forehead, seemingly indicating the locus where the discourse topic should be stored in memory. He verbalizes: yeah (.) but the: the the other thIng is like (.) AH:M (-) eh:; (L03, Fig. 2a), accompanied by micro-pauses, hesitation markers, and elongated speech sounds.
It appears that the participant is pointing towards the location where the topic of discussion is presumed to reside in his mind, with the act of tapping his head functioning metonymically as a representation of the memory's storage place. Simultaneously, this gesture metaphorically reflects the thoughts he is attempting to retrieve. In L05, he elaborates further by stating, i hAd it right ˊHERE- while pointing towards his face with both open hands and breaking into laughter. Through this comment, he extends the metonymic-metaphoric chain.
Subsequently, in L06 (Fig. 2d), he snaps his fingers and reiterates that he has lost his train of thought. This gesture enriches the metonymic framework, as the action of snapping fingers encapsulates an entire Idealized Cognitive Model (ICM), where the brief gesture and sound symbolize the broader concept of the act of forgetting. Metaphorically, this gesture is mapped onto the target domain of the sudden disappearance of memory, as illustrated in Table 2. In this instance, the ontological metaphor underlying the figurative expression is reactivated—a phenomenon commonly observed in foreign language use and often rendered particularly visible through multimodal displays (Littlemore; Low, 2006; Pitzl, 2018; Schröder, 2025). A dynamic scenario unfolds as access to an Idealized Cognitive Model (ICM), facilitated through metonymy, emphasizing the encyclopedic, flexible, and idiosyncratic nature of the knowledge networks in human cognition.
According to Mittelberg's (2019) framework of metonymy, this scenario constitutes a relatively simple frame, involving a single idiom tied to a basic, universally comprehensible scene for speakers of Indo-European languages using ELF. The familiar and centuries-old MIND-AS-CONTAINER metaphor further reinforces this understanding. However, the interplay of two metonymies and one metaphor, coupled with their expansion, reactivation, and multimodal elaboration, transforms this brief sequence into a conceptually performance.
4.2 When, how long, with whom, when I come back
Sequence 3 captures an excerpt from an elicited conversation involving three exchange students at the beginning of their stay in the host country. The participants include two sixteen-year-old German male students and one seventeen-year-old Swedish female student, all of whom were part of the Youth for Understanding program. The students had arrived in Brazil one month prior to the first recording. The recordings were conducted in a Brazilian town with a population of approximately 300,000, located in the state of Minas Gerais. The second recording took place after the participants had spent eleven months in Brazil. During this period, the exchange students were fully integrated into the Brazilian school system and resided with Brazilian host families. In the following extract, the participants discuss their experiences with their host parents, when G2 begins to talk about the controlling attitudes of his host parents.
Sequence 3: ((25:25-25:41))10
| 01 | G2 | now (.)when i want to do SOMEthing <<brushing his RH in |
| parallel to his torso> with someOne, | ||
| 02 | (.) then i have Always to say <<ROH touches LOH> ↑WHEN;> | |
| 03 | <<holding 2OHPV in parallel away from body> how ↑LONG;> | |
| 04 | with <<pointing with the R2 on LOH PV> ↑WHOm; | |
| 05 | S1 | [yeah;] |
| 06 | G2 | [and ] WHEN i've when i want to (.) when (.) i:; |
| 07 | <<turning around his arched ROH PU quickly> Come ˋBACK;> | |
| 08 | S1 | ((nods)) |
| 09 | ALways; | |
| 10 | S1 | <<nodding> yeah;> |
The host parents' behavior regarding 'going out with friends' is analyzed in detail, rather than being summarized under a generic concept such as 'a more controlling attitude.' Instead, G2 emphasizes specific components of this domain: ASKING WHEN FOR CONTROL, ASKING HOW LONG FOR CONTROL, ASKING WITH WHOM FOR CONTROL, and ASKING WHEN HE COMES BACK FOR CONTROL. These represent SUB-EVENT FOR WHOLE EVENT metonymies (Littlemore, 2015) and may point to stronger paternalistic tendencies in Brazilian culture as perceived by G2. This underlying concept is not explicitly mentioned but is implicitly established as ad hoc common ground among these three exchange students.
A closer examination of the gestural level, as shown in Table 3, reveals a gestural display that not only underscore these metonymies but also introduces metaphtonymies since metaphors are reactivated on a deeper level.
| TRANSCRIPT (GAT 2) | GESTURE FORM | METONYMY/METAPHTONYMY | IMAGE | |
|---|---|---|---|---|
| 02 | H-SHA: | 2OH 1-4 closed, 1 angled RH PV, LH PU | Metonymy: TIME REQUEST FOR CONTROL; | 3a |
| then i have Always to say ↑WHEN; | S-MOV: | Straight | Metaphor: A MOMENT IN TIME IS A POINT IN SPACE | |
| D-MOV: | RH towards LH | |||
| G-POS: | lower | |||
| 03 | H-SHA: | 2OH 1-4 closed, 1 angled | Metonymy: TIME REQUEST FOR CONTROL; | 3b |
| how ↑LONG; | H-ORI: | PV | Metaphor: A TIMESPAN IS A PATH | |
| S-MOV: | In parallel | |||
| G-POS: | lower | |||
| 04 | H-SHA: | LOH 1 angled, RH fingers bent, 2 stretched; LH PV | Metonymy: PERSON REQUEST FOR CONTROL; | 3c |
| with ↑WHOm; | S-MOV: | Pointing | POINTING TO THE PERSON STANDS FOR THE PERSON | |
| D-MOV: | RH towards LH | |||
| Q-MOV: | reduced size | |||
| G-POS: | lower | |||
| 06 | H-SHA: | ROH | Metonymy: REQUEST FOR POINT IN TIME FOR CONTROL; | 3d |
| [and ] WHEN i've when i want to (.) when (.) i:; | H-ORI: | PU | TURNING MOVEMENT FOR THE WHOLE ACTION (OF COMING BACK) | |
| S-MOV: | arched | |||
| D-MOV: | horizontal from right to left | |||
| G-POS: | lower | |||
| 07 | S-MOV: | arched | ||
| cOme ˋBACK; | D-MOV: | horizontal from right to left | ||
| G-POS: | lower | |||
As shown in Table 3, in addition to the overarching CONTROL scenario, which is composed of metonymies providing access to the broader frame, two more specific metonymies are present: (a) in L04, POINTING TO THE PERSON STANDS FOR THE PERSON, and (b) in L06, TURNING MOVEMENT FOR THE WHOLE ACTION (OF COMING BACK). These two types of metonymies are precisely identified by Mittelberg and Waugh (2009) and Mittelberg (2019). Specifically, they describe HANDS AS METONYMIC SOURCES POINTING TO AN OBJECT OR SPACE AS METONYMIC TARGETS, standing for the object itself, as well as PART OF AN ACTION FOR THE WHOLE ACTION, both of which are prototypical examples of metonymy in gesture.
In L02 and 03, the classical conceptualization of time in spatial terms is invoked, specifically what Lakoff and Johnson (1999) refer to as the MOVING OBSERVER or TIME'S LANDSCAPE METAPHOR. Unlike the MOVING TIME metaphor, where time itself moves, it is the observer who moves, with each location in the observer's path representing a point in time. This is reflected in two metaphors: (a) in L02, A MOMENT IN TIME IS A POINT IN SPACE, and (b) in L03, A TIMESPAN IS A PATH. Although conventionalized, these metaphors are reactivated gesturally here (Pitzl, 2018). Moreover, all key terms are highlighted by prosodic cues, including focus accent, three pitch jumps, and one falling pitch movement: ↑WHEN (L02), how ↑LONG (L03), with ↑WHOm (L04), cOme ˋBACK (L07). The co-occurrence of these lexical units with gestures and the prosodic repetition of pitch jumps can be interpreted as what Selting (2007) terms "list construction."
Gestures and prosody are embedded in the cultural frame of CONTROL, which contrasts the Brazilian host parents' behavior with that of the speaker's German parents. In this context, metaphtonymy operates on a multimodal level within talk-in-interaction. It constitutes a complex semantic network that enables interlocutors to pragmatically infer the (inter)cultural implications of this frame, allowing them to co-participate in accessing the cultural and abstract concepts implied. This is evident in the response tokens provided by the Swedish participant (L08, 10: nodding; L05, 10: yeah). Thus, emerging common ground is co-constructed from the shared experiences and the contrast between two diverging cultural landscapes, rather than solely between the linguistic and national background.
4.3 The door is never closed
In Sequence 4, taken from the same setting, the three exchange students discuss their living situation with their host families, focusing particularly on the issue of private space. G2 raises the concern that members of his host family enter and exit his room at will, a behavior that is confirmed by S1.
Sequence 4: 2012UbGeSw01 ((32:50-33:08))12
| 01 | G2 | <<pointing with RH to his chest> in my apartment they're |
| VEry <<opening 2H PD outwards> dIfferent;=> | ||
| 02 | =<<throwing RH towards body> they jUst come in;> | |
| 03 | [and THEN;] | |
| 04 | S1 | [yeah; ] |
| 05 | [in MY room too;] | |
| 06 | G2 | [<<grabbing movement with RH TB> the door is NEver closed, |
| 07 | <<grabbing movement with RH TAB> when the door is CLOSED | |
| they just Open it, arched movement with RH> and go In;> | ||
| 08 | [((laughs))] | |
| 09 | S1 | [((laughs))] |
| 10 | well (.) i_i <<moving raised LH with crooked fingers TB> | |
| CLOSE the door and it is closed;> | ||
| 11 | <<moving RH to right> but (.) people> just come ˆIN; | |
| 12 | <<LOH PU with crooked fingers> and they sit> on my | |
| bed for a whIle and TALK and then; | ||
| 13 | <<ROH PU moving upward> they go aWAY, | |
| 14 | [they sit and like (-) <<slapping RH in the air> ↑Okay:;] | |
| 15 | [they come to you YES yes it is like this; ] | |
| 16 | <<waving RH> bye bYe> (-) <<laughing> ↑see you | |
| LAter;=heHE-> |
This sequence illustrates how metonymy develops or narrows (Denroche, 2015, p. 57-58) within a cultural context co-constructed by the participants G2 and S1, with the conceptualization being co-expressively displayed across both the verbal and gestural planes. G2 initiates the sequence by introducing the underlying topic of the 'private sphere,' explaining that his host parents treat his private space very differently compared to the way his parents handle it in his home country: in my apartment they're VEry dIfferent (L01). As we can see in Table 4, gestures play a pivotal role in elaborating this metonymic chain:
| Transcript | Gesture form | Metonymy/Metaphor | Image | |
|---|---|---|---|---|
| 01 | H-SHA: | ROH, 1-5 stretched | Metaphor: | 4a |
| in my apartment they're VEry dIfferent;= | H-ORI: | PV | DIFFERENCE IS OPPOSITE DIRECTION | |
| S-MOV: | AB; | |||
| D-MOV: | straight; | |||
| Q-MOV: | accelerated | |||
| G-POS: | center, lower left | |||
| 02 | H-SHA: | ROH, 1-5 crooked | Metonymy: DIRECTION FOR ACTION | 4b |
| =they jUst come in;> | H-ORI: | PV | ||
| S-MOV: | arched | |||
| D-MOV: | from right to left; | |||
| Q-MOV: | accelerated; | |||
| G-POS: | lower right | |||
| 06 | H-SHA: | RH, 1-5 crooked | Metonymy: PULLING THE DOOR FOR COMING INTO THE ROOM, INVASION INTO PRIVATE SPACE | 4c |
| the door is NEver closed todAy, | H-ORI: | PL | ||
| S-MOV: | straight; | |||
| D-MOV: | towards body; | |||
| G-POS: | from lower right to center | |||
| 07 | H-SHA: | RH, 1-5 bent | Metonymy: PULLING THE DOOR FOR COMING INTO THE ROOM, INVASION INTO PRIVATE SPACE | 4d |
| when the door is CLOSED they just Open it | H-ORI: | PL | ||
| S-MOV: | Grabbing | |||
| D-MOV: | towards and away from body; | |||
| G-POS: | center center | |||
| 07 | H-SHA: | ROH | Metonymy: FINAL STATE FOR WHOLE ACTION | 4e |
| and go In; | H-ORI: | PL; | ||
| S-MOV: | circular; spiral | |||
| D-MOV: | right | |||
| 10 | H-SHA: | LH, 1-5 bent | Metonymy: FINAL STATE FOR WHOLE ACTION | 4f |
| well (.) i_i CLOSE the door and it is closed; | H-ORI: | PV | ||
| S-MOV: | straight | |||
| D-MOV: | towards body | |||
| G-POS: | center | |||
| 11 | H-SHA: | RH, 1-5 bent | Metonymy: PULLING THE DOOR FOR COMING INTO THE ROOM, INVASION INTO PRIVATE SPACE | 4g |
| but (.) people just come ˆIN; | H-ORI: | PL | ||
| S-MOV: | straight | |||
| D-MOV: | from left to right | |||
| G-POS: | center | |||
| 12 | H-SHA: | LH, 1-5 crooked | Metonymy: PART OF THE OBJECT (BED) FOR ACTION (SITTING ON THE BED), INVASION INTO PRIVATE SPACE | 4h |
| and they sit on my bed for a whIle and TALK and then; | H-ORI: | PU | ||
| G-POS: | center left | |||
| 16 | H-SHA: | RH | Metonymy: PART OF ACTION (WAVING HAND) FOR WHOLE ACTION (LEAVING) | 4i |
| bye bYe (-) <<laughing> ↑see you LAter;=heHE-> | H-ORI: | OP | ||
| S-MOV: | waving | |||
| D-MOV: | horizontal; | |||
| G-POS: | center center | |||
While explaining, G2 concurrently moves both arms with flat hands in opposite directions (right and left). This gesture represents a conventionalized monomodal metaphor (Cienki; Müller, 2008), which can be described as DIFFERENCE IS OPPOSITE DIRECTION (Image 4a). He then proceeds to describe how members of his host family enter his room without knocking, executing a quick movement with his right hand—fingers crooked—from right to left in an arched motion. This gesture symbolizes how they suddenly invade his private space (L02; Image 4b). The movement itself is reductive, primarily indicating direction and speed, thus functioning as a SUB-EVENT FOR WHOLE EVENT METONYMY (Littlemore, 2015), where the gesture represents the entire action of intruding into his personal territory. In L06-07 (Images 4c-e) G2 continues explaining that the door is NEver closed todAy, and when the door is CLOSED they just Open it and go In. Concurrently, G2 demonstrates gesturally how a member of his host family grabs the doorknob and opens the door. These movements, which indicate the OPENING OF THE DOOR, can be interpreted as metonymically representing the entire action of COMING INTO THE ROOM. This experience is shared by the Swedish co-participant, who not only confirms G2's narration through affiliation tokens (L04, 05, 09) but also takes the floor in L10 to offer her own version of this culturally divergent habit. She begins by re-enacting a similar scenario from her host parents' home, where she typically closes her door (L10; Image 4f), which is restaged metonymically as FINAL STATE FOR WHOLE ACTION (Littlemore, 2015). However, her host family members don't care and come into her room as they like (L11-12) : but (.) people just come ˆIN; and they sit on my bed for a whIle and TALK and then;. Their decision to leave her alone after a while is embodied through a waving hand gesture, which adds an ironic tone to the bewildering situation. The gestural metonymies co-constituting this scenario include PULLING THE DOOR FOR COMING INTO THE ROOM (Image 4g), PART OF THE OBJECT (BED) FOR ACTION (SITTING ON THE BED) (Image 4h), and PART OF ACTION (WAVING HAND) FOR WHOLE ACTION (LEAVING) (Image 4i). Concurrently, the metonymy points to the cultural concept of private space, which is portrayed as being invaded in this context.
The frame scenario of entering the exchange student's room at the host family's home is, on one hand, deconstructed into metonymies on the verbo-gestural level that re-enact the prototypical experience of the sudden opening of the student's door. At the same time, this re-enacted scene metaphorically reflects the cultural conceptualization of PRIVATE SPACE, highlighting the distinct boundaries between PRIVATE and PUBLIC SPHERES. These boundaries appear to be more fluid in Brazil compared to the student's countries of origin, Germany and Sweden.
DISCUSSION
As we have seen, metonymies serve as valuable tools in ELF interactions, helping to create common ground ad hoc and in situ (Kecskes, 2014; Schröder, 2025). They enable co-participants to break down complex and abstract concepts into concrete, imagistic components of larger scenarios. In doing so, participants often employ co-speech gestures. The first sequence demonstrated that an idiom, such as "lose the point," can be reactivated in lingua franca interactions, as discussed by Pitzl (2018) who underscores that idioms might undergo a process of re-metaphorization. We have seen that this is particularly highlighted when gesture re-activates and foregrounds an entrenched idiom and a lingua franca user becomes aware of its metaphorical anchoring and elaborates on it. The second sequence demonstrated how the TIME AS SPACE metaphor is metonymically constituted through specific gestures, and how these sub-actions collectively form the larger scenario, which is re-activated as conventionalized metaphtonymies through the high gestural engagement of the ELF speaker evoking common ground by shared experience as exchange students. The last sequence showed that even though the exchange students do not explicitly reference the PRIVACY schema, the imagistic frame metonymies and metaphors are inferentially and implicitly conveyed. These can be further accessed through contextualization cues such as laughter, ironic markers on the lexical level, and pitch movements to signal (inter)cultural bewilderment.
As noted in the Introduction, following the shift toward usage-based approaches in cognitive linguistics, interest in the relationship between language and cognition has thus far been pursued primarily through corpus-based analyses of written discourse. Within this body of research, metaphor has consistently received more attention than metonymy—a tendency that also characterizes studies of multimodal discourse, where media such as advertising have additionally been privileged over face-to-face interaction (Zima; Brône, 2015). Moreover, much of this work has adopted a top-down orientation.
By contrast, conversation analysis and interactional linguistics have generally excluded cognitive and cultural dimensions, a limitation that has frequently been criticized (see Loenhoff, 2022).
A similar dilemma is evident in research on English as a lingua franca (ELF) and World Englishes. While the latter field aligns with cognitive linguistics through its comparative orientation, studies of ELF have primarily focused on interaction and on features of ELF speech, often abstracted from the cultural and cognitive resources that participants bring to the interactional context. Researchers who have addressed the cognitive dimensions of ELF (for an overview, see Hall, 2017) discuss the assumption of a supra-cognitive category of language operating at the community level, elaborated through shared group knowledge that transcends national varieties—an assumption many view with skepticism. Another important aspect, illustrated in Sequence 1, is the role of creativity that emerges from uncertainty about shared norms. Corpus-based studies have shown that ELF interaction fosters heightened levels of creativity and innovation (Pitzl, 2012, 2018), a phenomenon to which Kecskes (2014, 2018) has drawn particular attention. However, this dimension has not yet been systematically investigated from a multimodal perspective.
CONCLUDING REMARKS
The aim of this article has been to contribute to research on situated cognition, understood here as the study of cognition at the interface of multimodal talk-in-interaction, by starting from a fine-grained sequential analysis. As demonstrated, the fields we bring into dialogue rarely incorporate each other's methodological perspectives, even though such integration could generate valuable insights and foster new debates—whether critical or productive. Despite the promise of these cross-fertilizations, a sustained dialogue between the perspectives remains largely absent.
ACKNOWLEDGMENTS
I would like to express my sincere gratitude for the support provided by the institutional partnership between UFMG and the University of Potsdam, facilitated by the Research Group Linkage Programme of the Alexander von Humboldt Foundation, Germany. I also wish to extend my heartfelt thanks to CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for their funding through the Capes-PrInt programme, which enabled my postdoctoral research year at the University of Texas at Austin, USA, and the University of Duisburg-Essen, Germany. Additionally, I am deeply grateful to FAPEMIG Universal (2024-2027), as well as to CNPq, for their Fellowship Program (2022-2025). Lastly, I would like to acknowledge CAPES and DAAD for their support through the Probral fellowship (2023-2026) for the bilateral project on 'The Multimodal Coordination of Intercultural Video-Mediated Interaction.'