Excerpted from Language in Hand: Why Sign Came Before Speech by William C. Stokoe.
Published by Gallaudet University Press. ©2001, by Gallaudet University. Reprinted with permission.
CHASING THE LANGUAGE BUTTERFLY
I am suggesting that we should look for the beginning of language in the actions of other living things, even the housefly evading a swatter. The connection is unmistakable. Not just flies but all creatures with functioning eyes interpret what they see to survive. Visible or otherwise perceptible actions are signs, and creatures interpret them as signifying something else. This is as true of the swing of a flyswatter or the presence of sugar molecules as it is of words and sentences. As species evolved, the natural and necessary biological process of interpreting meanings evolved with them. Defining meaning that way, we will try to trace the way form-meaning pairs, an aspect of all animal life, could have evolved into genuine language signs, the words and sentences in common use.
At the start it is important to realize that not all words and sentences have to be made of vocal sounds or their representation in writing, even among hearing people who only use a spoken language. When a shopper tells a clerk, “I’ll take a dozen of those,” the clerk has no idea what the shopper means without seeing a pointing finger. The visible, nonspoken, sign imparts meaning to the shopper’s otherwise ambiguous utterance.
This example demonstrates a common intrusion of visible signs into sequences of normally spoken signs. But in deaf people’s languages all the words and sentences have to be seen. Deaf signers make words and sentences out of easily seen bodily movement, just as hearing speakers make words and sentences out of vocal sounds. Movements seen by a visual system or sounds heard by an auditory system can be the primary symbols of a language. Yet vision may have an advantage, for it is neurologically a richer and more complex physiological system than hearing. Sight makes use of much more of the brain’s capacity than does hearing.
What kinds of symbols might the very first languages have used? The easy response to such a question is that first languages originated too long ago for anybody to be able to know, but we can consider the issue with another set of questions: How could vocal sounds have been put together to make words and sentences? Who was there to tell the earliest speakers on earth what the sounds they were making meant? Who could tell them which words were verbs and which were nouns? And who explained to them how to combine words to make larger-than-word meanings? Expressed this way, it seems that speech alone could not have ushered in language. Speech is a way to deliver language. Language has to be in place, in cognitive systems, in brains, before speech can convey it.
One popular theory of why humans have language proposes that a mutation of brain cells gave them a unique “language organ,” but the work of some biologists and other brain scientists who study mutations casts doubt on this.1 An alternative explanation would need to examine what signs humans or hominids could have used at first to represent and transfer meaning; that is, other than sounds, what signs could have been interpreted as words and sentences without language already being in place, fixed in the brain by a fortuitous mutation?
Visible movements provide a ready answer. Movements are naturally interpreted by many different kinds of animals, although the richness of the interpretation varies according to the nature of the animal. The main argument that ties together the following chapters is that movements, which are natural signs naturally interpreted, evolved into genuine language signs. These movements became, with full metamorphosis, the actual words and sentences that we now, hundreds of thousands of years later, speak and write down. Movements continue to make meaningful signs, but for most hearing people they have become largely invisible, taking place inside the throat, producing sounds to be heard rather than actions to be seen.
The differences between the way we see and the way we hear readily demonstrate that visible signs carry more information than audible signs. At a single glance we see a great deal, usually more than we consciously focus on. We can hear language sounds of great variety and discriminate among them very rapidly, but they have to come in succession—we have to wait until a string of them has traveled through our inner ears to our brains before we can be sure what a particular sequence of language sounds, or a spoken sentence, means.
Although we cannot be sure what dinosaurs and other extinct animals could hear or smell, the available evidence assures us that they too could see and move and use their vision and movements to get food, even living prey that had to be pursued and captured. Millions of years later as the Tree of Life branched, visible movements made by animals were communicating not just the information they needed for getting food but also the kind of information that makes social existence possible for those we recognize as social animals.
Humans use visually perceived movements in the same way as other animals, but we do more: we make movements ourselves to resemble visible things and beings and changes and actions. Clever as we are, however, we cannot make sounds resemble the visible world. Sounds are, and always have been, highly useful in the animal kingdom for giving alarms, expressing emotional states, marking off territory, calling companions, and attracting mates; but only a visible sign can look like something else.
Language could have begun when early humans interpreted more richly than other animals could the movements they saw, especially the movements they themselves made expressly to transfer information.
These observable characteristics of sensory perception, as well as the existence of signed languages, tell us that speech is sufficient but not necessary for language. I believe the case is even stronger, though. Visible human movements are not merely sufficient for language but were absolutely necessary for making that first solid connection between sign and meaning. To put it plainly, language could have begun when early humans interpreted more richly than other animals could the movements they saw, especially the movements they themselves made expressly to transfer information.
Gestures, movements of arms, hands, head, and other body parts, as well as visible changes of the appearance of the face, can look like, point to, or reproduce a whole world of other visible things, things that human and closely related species had to do and attend to. Upper body gestures transfer information; they carry meanings between intelligent creatures. Facial expressions and a person’s overall appearance, along with arm and hand movements, can reveal what a person feels about what is being represented, as Darwin made clear in The Expression of Emotion in Man and Animals.2 Vocal sounds can certainly reveal feelings as well, but if they are not part of an already formed and agreed-upon language they cannot do so with the complexity and effectiveness of visual signs.
Even more than the composition of language signs, the channel used for perceiving them determines the nature of language. Many animals have a very useful olfactory system for getting information, for example, but a language of odors will not work for humans. We can discriminate too few odors to serve as symbols conveying the complexity of human thought. Moreover, except in animals like skunks, odor production is not under voluntary control. We do not need to make a complete inventory of the human sensory systems to find those suitable for adaptation to language. Only two have the power to detect and process language symbols naturally. Only sight and hearing in higher primates have a large enough network of brain centers and neural connections for the enormous task of processing a language.
Visual processing involves so much of the brain that a visual ﬁeld may convey an enormous amount of information simultaneously, whereas language sounds have to reach the ear sequentially, one by one.
Seeing and hearing, however, are not equally powerful sensory systems. The nerves connecting eyes and brain outnumber by far all the brain connections to the other sensory organs, the ears included. Visual processing involves so much of the brain that a visual field may convey an enormous amount of information simultaneously, whereas language sounds have to reach the ear sequentially, one by one, until the whole message is received and can be interpreted. This difference is shown by everyday experiences.
Take, for example, the act of driving in a lane of rush-hour traffic. Suppose I notice the car in front of me has its brake lights on and is slowing. At the same time I see that the traffic light up ahead is not yellow but green and my speedometer reads 50 miles per hour. The rearview mirror shows that the car following is a safe distance behind. To put these details in speech may take less time than reading them; but even so, saying them takes far more time than I have in which to decide what to do; yet every bit of this visual information is instantaneous, and my reaction immediate. Hence, the old saying that “a picture is worth a thousand words.”...
My original idea that signing can be language has grown into a belief that language began when a human species interpreted gestural signs both semantically and syntactically. The latter is particularly important, for gestures have syntactic power; they are not just visible movements that represent something else. Early humans saw the hands standing for things or creatures and the movements representing actions or changes. And as hands are physically, visibly, and cognitively connected to their movement, so are the hands’ and movements’ meanings symbolically (if not quite literally) connected to their meanings— hence, syntax.
A species already in command of language in kinetic and visible form had hundreds of thousands of years to adapt guttural, oral, and nasal physiology for making sounds different enough to represent each visible sign and its meaning. Using this dual-channel system of signs and speech, they would have connected the vocal signs and the visible signs to the same meanings. The conventional association of audible sign to meaning then survived the gradual disuse of the visible sign. Hands were now freed for other tasks, and speech predominated.
To the first creatures with our kinds of bodies, they themselves, their world, and what happened in it would have been salient, highly visible, and increasingly intelligible.
The earliest language signs surely referred to their makers and signified objects in the world around them. What they were seen to be doing or had done or should do would make up the primary content of their discourse. To the first creatures with our kinds of bodies, they themselves, their world, and what happened in it would have been salient, highly visible, and increasingly intelligible. Less formidably equipped with canine teeth than earlier primates, our ancestors needed to see and understand the world as it related to them in order to survive. By helping each other consciously, they thrived and populated many regions of the world, and they changed the environment for their own benefit. Mike Beaken has conceptualized this pointing of resources and cooperation as a force moving toward the beginning of language.3 Jonathan Kingdon thinks that early humans may have hunted, trapped, and driven large mammals to extinction and, intentionally or not, burned forests, with the result that grassland replaced woodland and attracted easier-to-capture grazing animals—some of which were ultimately domesticated.4 It is hardly possible that early humans (before the era of articulate speech) could have accomplished all this without a visible language.
Vision may be the master human sensory system, but in order for seeing to make the difference it did, humans had to be equipped with the means to make use of the information acquired by way of their vision.5 Early human activity required constant use of uniquely structured human hands and arms. Vision guided this activity, and mental categorization, learning, and memory aided by perceptible representations made this guidance more effective. There is a subtle but important difference between banging noisily with a stick, a threat behavior that chimpanzees engage in, and using a stick to strike a tree branch to dislodge unreachable fruits or nuts or using it to point to various directions that members of a foraging party should take. Vision-guided hand and arm use with a clearly conceived purpose was certainly part of early human behavior....
From the same ancestor as chimpanzees, our genus evolved perhaps two million years ago, with larger brains, more vertical bodies, and hands better adapted for manipulation than chimpanzee hands. One species in this line, Homo erectus, had as much conceptual power and motor-visual skill as chimpanzees, but the erectus’ larger brain cases suggest that their cognitive and manual skills were more highly developed. Members of Homo erectus apparently spread from Africa to most of the Old World, and what they left behind shows that they performed more and more varied kinds of purposeful activity than did apes. The use of hands and eyes to represent concepts would explain this novel activity.
Manual actions could have become the first way to represent concepts formed within the enlarging brain, giving them visible form where the master sense, vision, could detect and interpret them. Simply by making a characteristic hand-arm action, while holding the hand as if it were holding the implement or object used in that activity, an individual would represent, and be seen to be representing or suggesting, exactly that activity. The action becomes a representation as soon as its maker and others see that the virtual action stands for the veritable action. Furthermore, attention focused on handshape and movement could have led to the development of sentence meanings and nounlike and verblike meanings all contained in the whole gesture.
Understanding a gesture and all that it represents as a sentence calls for a cognitive jump, but that jump is made from a solid platform of evolved visual, manual, and cognitive activity.
Understanding a gesture and all that it represents as a sentence calls for a cognitive jump, but that jump is made from a solid platform of evolved visual, manual, and cognitive activity. The evolutionary solidity of the platform is dramatically attested by “mirror neurons” in primate brains: “These are neurons found in the monkey ventral premotor cortex that are action-oriented, context-dependent, and implicated in both self-initiated activity and passive perception. They are active both when a monkey observes a specific action (such as someone grasping a food item) and when the monkey performs the same action, where sameness implies not mere grasping but grasping of the food item.”6
These neurons in monkey brains, and— because evolution is conservative—in ape and human brains as well, are implicated visually and motorically with food, the hand, and with grasping food. Given the differences between monkey and human brains, the human use of a hand’s action not just for food and grasping it but for representing all of that and other events, is entirely too natural to require a mutation of brain cells to explain it.
The course that language took over a million or so years does resemble closely something in biology—a life cycle that transforms itself, as the butterfly caterpillar’s does. That a visible human movement could metaphorically be the egg out of which the gesture, as movement with rich meaning, emerged fits well the current understanding of the workings of the brain. So too does the differentiation of the unitary gesture into nounlike and verblike meanings. Neuropsychologist Marcel Kinsbourne writes:
Rather than being assembled piecemeal and glued (conjoined, integrated) together, the percept, construct, utterance, or intention gradually differentiates out of the preexisting brain state. Diversity is continually being carved out of the existing unity. The operative question is not “How are the details assembled into the whole?” but rather “How is the whole reshaped to incorporate the details?”...
Experience is not a composite assembled out of its parts. The contrary position—that experience is carved out of a less differentiated whole—gains plausibility. While no truly apt metaphor for how the brain works comes to mind, “crystallizing out” seems more fitting than “assembling together.”7
What Kinsbourne says about experience here applies equally well to syntax, the sentence. Chimpanzee gestures are undifferentiated wholes, and yet creatures only an evolutionary step or two away from us can see that similar gestures may be regularly associated with concepts. Then, in the hands, eyes, and brains of humans, sentences, consisting at least of noun and verb, may be said to “crystallize out” of the gesture...
Like the larvae of species that go through a complete metamorphosis, early human gestures must have been voracious, growing rapidly by representing item after item in the visible world. They helped individuals gather together with mutual understanding and work cooperatively to solve problems that could not be solved individually. At the same time, the ideas represented made social action more complex and more efficient. For example, a mother could easily gesture to a child, “Don’t do it that way, do it like this,” thus accelerating the child’s learning of an important skill. The message, “Not there! Come over here!”—which could have been gesturally expressed more than a million years ago—would likewise affect the success of a whole hunting party.
Visually, kinesthetically, and conceptually, gestures can make the first manifestations of language. The kind of social life implied by the fossil record from one to two million years ago suggests a genuinely human means of communication—an early, not vocally articulated but complete, language. Evidence points to an early emergence of language, but the emergence of speech and the fully modern vocal tract is relatively recent....
The human nervous system functions the way it does, I argue, not because its neurons are repositories for the rules of universal grammar or any other blueprint of the universe, but because it operates with information (electrochemical impulses) supplied by its perceptual systems. The human nervous system also uses information of a richer, more concentrated kind than that delivered directly by perceptual systems. Working with symbols, the brain-equipped, truly human creature took the few symbolic gestures used by earlier-to-evolve species, added copiously to them, and saw within them (by crystallizing out of them) two important categories of meaning: nounlike and verblike. This pioneer human saw also that this can be symbolized separately... As a result, information, knowledge, and truth about the social and natural world can be shared to become cultural tradition—symbolic behavior.
The sequence implied by the butterfly analogy, gesture to language to speech, is only a brief episode in the evolutionary history of semiotic behavior; but it is the one most important to us. It would be a serious error to suppose that symbolic behavior is not part of the biological continuum called evolution...
EMERGING FROM THE COCOON
SIGHT AND SOUND
Four facts suggest that language in visible symbols came before speech:
- Sign language still exists.
- Spoken utterances often require accompanying visible signs in order to be fully understandable.
- Only visible signs have natural links to concepts and syntactic structures.
- All human infants use gestures to communicate before they master the language of their caretakers, whether that is a spoken language or a signed language.
A fifth suggestive fact is that vocal and gestural symbolization are closely related.
Centers in the brain controlling speech production are the same as, or adjacent to, centers controlling manual movements; thus, the former may well have evolved from, or along with, the latter.8 When gestures are defined as intentional, willed productions of complex, interrelated, and coordinated muscle actions, it becomes clear that gestures, most of them invisible, also produce speech.9
Much information that speakers take in by listening, deaf signers take in by looking. This use of different sensory systems for detecting and processing language has a direct effect on brain activity.
In that long-ago era when visible gestures may have transformed hominids’ communication into language, the cultures would have been very different from our modern, speaking cultures, if only because the two kinds of languages are built around very different perception-action systems. Much information that speakers take in by listening, deaf signers take in by looking. This use of different sensory systems for detecting and processing language has a direct effect on brain activity. Brain blood flow monitoring shows that deaf and hearing children of deaf parents process sign language differently, apparently because of the effect of auditory stimulation on cortical organization.10 Although today the brain’s processing of audible and visible primary language symbols can be monitored by scanning techniques, we can only infer the cultural changes that a radical prehistoric change in transmission and reception channels would have caused.
The consequences of changing from signing to speaking may have been revolutionary, but the causes of the change had to be cultural. In the first place, there has never been a total switch of language channels. Gestures, visible signs made by hands for transferring social information, have never disappeared from human use. They remain not only in the primary sign languages of deaf people, and the alternate sign languages preserved by certain Australian, Native American, and African groups, but also in everyday use by speakers everywhere. Many experts perversely label these gestures “nonverbal,” but such labeling manipulates the meaning of verbal the way a thimble rigger manipulates the walnut shells hiding the pea...
Neither a mouth nor a hand nor a brain nor any other single organ makes language. Hence “oral language” and “manual language” are barbarisms no matter how often they have been used to contrast the languages of hearing and deaf people.
Language is a special kind of behavior of the whole organism. It requires the brain’s normal functioning to connect sensory, motor, and cortical networks and the brain’s coordination of the sensory systems and of larger (skeletalmuscular) or smaller (respiratory tract) motor systems. In plain terms, a heart pumps blood and a stomach digests food, but neither a mouth nor a hand nor a brain nor any other single organ makes language. Hence, “oral language” and “manual language” are barbarisms no matter how often they have been used to contrast the languages of hearing and deaf people. Language, though socially indispensable, is not a physiologically vital function. The use of the terms verbal and nonverbal simply begs the question of the relationship between spoken and signed language. [Thomas] Bateson and [Gregory] Sebeok seem to use the adjective verbal to mean “expressed in and by language” rather than “spoken as opposed to written.” They then use the term nonverbal to set off from language anything they consider to be nonlanguage or not expressed in language. Because neither Bateson nor Sebeok fully recognizes that language is a system that may utilize either vocal sounds or gestural actions as its primary symbols, the word verbal as they use it applies to spoken language only.
When the terms language and verbal are recognized as true synonyms, referring to the same system, the phenomena that Bateson and Sebeok refer to are very different from reality. Vocal and gestural semiosis—the use of sounds and visible actions as signs—has always been part of general primate behavior, which includes the behavior of humans. Neither in human nor in animal behavior have visible signs for information transfer completely replaced audible signs, nor vice versa, except in the case of blindness or deafness.
GESTURE AND VOICE TOGETHER
What is more likely than replacement and decay of visible signs is that some million or more years ago, gestural acts became language signs by the steps outlined here. (From the beginning, vocalizations are likely to have accompanied visible language signs. This is certainly true in a child’s progress from being without a language to having a gestural communication system to having adult language.) At a later stage, the familiar vocalizations could begin to be used without the gestures they usually accompanied. This might have come about relatively quickly or over millennia during which meaning was expressed by visible gestures with incidental vocalization. Members of a social unit, hearing the sounds that normally accompanied the gestures, would still grasp the meaning even if the gestures did not appear in full. Likewise, if the eyes strayed away for a moment, or if occasionally the one vocalizing while signing omitted a visible gesture, the message would still get through. This is what sometimes happens with alternate sign language users.
Vocal and gestural communication are not natural adversaries, and only the invention of writing led to the mistaken assumption that speech alone could be language.
Neither speaking nor gesturing needed to decay or disappear. Use of these two modes of expressing meaning and these two sensory channels for reception have always been part of human life and cultures; but they are used in different ways at different times, in different circumstances, and for different purposes by peoples with different cultures. Vocal and gestural communication are not natural adversaries, and only the invention of writing led to the mistaken assumption that speech alone could be language.
Another way to see this evolutionary change with revolutionary effect is to recognize that language signs have always been gesturally expressed, first and mainly by visible gestures, later primarily by internal gestures producing and modifying sound; but visible expression of language signs has never disappeared. No organs, no skills had to decay, conspicuously or otherwise. The takeover of language sign production (in that large portion of the population that could hear) by a physically evolving vocal tract would have liberated the hands for myriad other tasks. A fully human vocal tract and an auditory system finely tuned to decode increasingly complex vocal signs freed the human visual system for other uses—until the invention of writing co-opted vision for the new skill of literacy. Using the eyes for reading, it should be noted, does not lead to the conspicuous decay of the ability to decode speech....
The change from making language signs out of visible gestures to making language signs with voices might be likened to the shift 150 years or so ago from canals to railroads. Although railroads today carry some of the passenger traffic and much of the freight that was once entirely consigned to canal transport, the canals have not disappeared. Some of them continue to be competitive for carrying certain kinds of freight, and they are increasingly used for recreation.11 Although gestures inside the neck and trunk produce the sounds that form most people’s language signs, visible signs still carry information that is important, even essential, to them. No sudden, cataclysmic event would have been needed to alter what actually changed—the proportion of language signs produced by “inner” and “outer” gestures.
The change in language channel use probably proceeded gradually, beginning with a few individuals but spreading to make a whole signing community into a speech community. Eventually whole populations would have followed the lead. The later part of the Middle Stone Age, perhaps as recently as fifty thousand years before the present, is the most likely time when such a gradual change neared completion. This is the period in which stone tool technology took off, suggesting a major cultural innovation rather than a gradual improvement of older techniques for working stone. Learning to make stone tools would have proceeded more rapidly when the instructions could be given vocally while the hands were busily working with the tools and materials.
But questions remain: Why would a shift from making language signs for the sense of vision to making them for the sense of hearing have begun? Why and how would speech as a surrogate for signing have persisted and spread?
These are not the usual questions asked about speech. Surrounded by speakers and only minimally aware that there are people and whole communities whose language signs are gestured rather than spoken, most people (language scholars with them) have assumed that language was always spoken, and that drum and whistle languages, ancient rock carvings, sign languages, and the like must all be surrogates for speech.
If speech began as a vocal accompaniment to visibly expressed language signs, it would have been quite natural to discover that the message was understood when only the sound was heard. The habitual association of a sound pattern with the missing gesture and its meaning would fill any visual gap. From that beginning, the growth and spread of vocalization and the lessening dependence on movement and vision would have proceeded as a natural cultural change.
Discovering that spoken sounds could more and more often be acceptable substitutes for the visible signs would have allowed unaccompanied vocal expression to increase in frequency until it became the preferred and usual way to communicate.
This change in proportion within the dual-channel system of communication could have come about simply and naturally. Certain visible signs might have been deliberately left out, but if sounds had usually accompanied the visible signs, those on the receiving end would still receive the complete message. Discovering that spoken sounds could more and more often be acceptable substitutes for the visible signs would have allowed unaccompanied vocal expression to increase in frequency until it became the preferred and usual way to communicate. At that point, the visible signs could be put to other uses—as still occurs with speakers. Trying to account, however, for the existence of primary sign languages under the assumption that language began as speech presents obstacles; for example, why, when even ad hoc gesturing can show naturally and at times with superior accuracy precisely what one means, would a primate species have invented and developed the “arbitrary” and “unmotivated” system of spoken language, and yet still keep on using the natural, gestural, system for certain purposes?
The number of people who use primary and alternate sign languages are few enough and their languages so little known that it has not occurred to many scholars to consider whether visible language signs might have been the origin of words, speech, and syntax. In most human societies, gesture is now used so differently from speech it seems that gestures could never have performed the functions of speech. And yet, though sight and hearing are both important for information transfer in all primate behavior, sight is still the master human sensory system.12
Instead of searching for precise brain areas or particular genes for particular grammatical functions, linguists and psycholinguists might take more notice of current neuroscience, which ﬁnds that brains work with broad connectivity.
Other primates, to be sure, make no separation between language channels, because there are no language signs to be separated out. General primate communication is both audible and visible. Human interaction is also mixed in mode, except that in historical times written language has provided an obvious though questionable sorting principle. Some may feel comfortable in saying categorically: “This is language, this is verbal; that is not language, that is nonverbal.” To confront this kind of assurance the American poet Carl Sandburg had an apt anecdote; a railroad brakeman once greeted him as he sat down in the smoking car with the familiar conversation opener of the 1920s in question form: “Whaddya know?—” and then after a pause, added, “—for sure?”13...
A DIFFERENCE THAT MAKES A DIFFERENCE
A better start in life for all children, not just deaf children, is a benefit that could accrue if spoken language came to be understood as the changed form of language originally signed. When this hypothesis about language origins is entertained, the direction of scientific research might also change. Language sciences might become more empirical and less formalistic. Psychologists might find new ways to discover what children really know and understand. Psychometrists might come up with new instruments for measuring mental abilities. Cognitive and brain scientists might find that the relationships of vision and movement to knowing, thinking, and using language are closer than they have heretofore imagined. Instead of searching for precise brain areas or particular genes for particular grammatical functions, linguists and psycholinguists might take more notice of current neuroscience, which finds that brains work with broad connectivity— integrating vision, physical movement, and cognition, and finding intricate patterns within existing wholes. Paleontologists’ ability to date and differentiate human and hominid species might also be sharpened by knowing that language could have begun long before speech emerged, and in its visible form could have been productively used and its patterns elaborated, giving them a full range of sentence patterns. Individuals might also benefit, simply from realizing that when it came to skills for survival, our ancestors back as far as a million years were hardly our inferiors in survival skills and innovation.