Sentence Production

  1. Researching sentence production
  2. What sentence production involves
  3. Garrett's model
  4. An example
  5. Encoding syntax and TAG model
  6. Evidence for first stage of grammatical encoding
  7. Evidence for second stage of grammatical encoding
  8. Feedforward models
  9. Interactive models
  10. Incremental planning
  11. Phonological planning
  12. Scope of planning
  13. Evidence from aphasia

1. Researching sentence production
Sentence production is more difficult to investigate than sentence comprehension. In comprehension, we can vary the input and then look for differences in responses resulting from those variations. With sentence production, on the other hand, it is much more difficult to find out how our thoughts, which are not necessarily temporally organized, are converted into sequentially ordered grammatical speech. Considering the speed with which we speak and the level of fluency achieved, speech production is one of the marvels of human performance.

Despite the difficulty of carrying out speech-production research, the investigations of the sentence level in unimpaired speakers have nevertheless lead to several infomrative models. We will look at what they have in common, as well as how the major ones differ. Then we will see what the investigations of aphasia have contributed to our understanding of way the intact brain must operate.

2. What sentence production involves
In brief, speech production commences with the thought the speaker wishes to express; it must then be converted into lexical items retrieved from the person's mental dictionary called the lexicon. Subsequently, these lemmas are converted into the form in which they will be pronounced, with syllable and sound information and the stress pattern of each word, adapted for its context. The final rendition bears the grammar and meaning that the speaker chose in order to convey his idea. Lastly, the muscle patterns must be specified and co-ordinated with breath outflow to pronounce the utterance.

3. Garrett's model
The model devised by Merrill Garrett (1975, 1982, 1988) is still influential. Using data gathered by others and himself, Garrett proposed a two-stage process to convert the message the speaker wishes to convey into a string of words that the speech organs can pronounce. The first component does what is called "functional encoding." The input that is functionally encoded is in the form of propositions, detailing the participants, if an action is going on, and who is doing the action to whom. The speaker then converts this message into the abstract words that bear the grammatical functions of the sentence, such as subject and object. At this stage there is no information regarding the speech-based (phonological) form of the words to be uttered, or their sequential order.

4. An example
To illustrate this first component, consider the sentence The boy kissed the girl. At this functional level, the speaker has selected the lemmas for boy, kiss, and girl. Lemmas are abstract words indicating their grammatical class, such as noun or verb, but not specifying the particular version, such as kissed versus kiss.

In our example event of a boy kissing a girl, the boy can still kiss the girl, yet be the subject or the object of the sentence, as the following two examples show:

(1) The boy kissed the girl. [Subject = the boy]
(2) The girl was kissed by the boy. [Subject = the girl]

Thus, the speaker will need to determine which participant will be the subject of the sentence, and that will commit him/her to a particular syntactic structure for the sentence. This word order of the grammar happens in the second component, the positional leve. Here, the speaker determines the positions of the content words, boy, kiss, and girl, and retrieves their speech-based or phonological forms. To do this positioning accurately, Garrett proposed that the speaker retrieves what is called a planning frame, which has the grammatical elements (the function word, the, and the inflection, -ed) already in their final places. In addition, there are slots into which the content word word-forms of boy, kiss, girl are inserted. Thus, at this level, the speech-based or phonological form of the utterance is evident, along with the surface order of the words to be spoken.

5. Encoding syntax and TAG model


Garrett provided no details on how the brain creates the hierarchical relationships among the words in the output of the second stage (such as the close connection between the and boy, and then between kissed and the girl, and, above that, between the boy and kissed the girl). This hierarchy is probably easier to understand in the following tree diagram. While syntactic structure-building happens automatically, it must be a complicated process, and yet it is impressively reliable. Originally, it was thought that the grammatical subject and object were retrieved with the verb at the functional level and formed the foundations of the structure created at the positional level. Recently, however, researchers have begun to venture other ideas. Ferreira (2000), along with others, for instance, envisages that already-assembled tree structures (like the tree diagram above) form the building blocks of syntactic structure. Simpler trees can be adjoined in order to achieve complexity, which gives rise to the name of the model, Tree-Adjoining Grammar (TAG model).

Figure 1. Tree Diagram

6. Evidence for first stage of grammatical encoding
The two components, of functional and positional levels, represent the grammatical encoding of a sentence. There is much evidence that supports this separation into two components. A major source of the evidence comes from speech errors. The speech-error data were collected covertly in the field over many years, usually by the researchers themselves, in noting down what errors their colleagues made in spontaneous conversations. What were noticed in those data were the differences in the types of errors; and yet the errors also fell into groups. Garrett and others (e.g., Fromkin, 1973; Shattuck-Hufnagel, 1979; Levelt, 1989) argued that the groupings were due to breakdowns at similar levels. At the functional level, which is thought to involve the planning of a whole clause, words of similar type (grammatical class) may inadvertently exchange. They do so over a wide distance, often straddling a phrase boundary. An example frequently quoted from Garrett is:

I left the briefcase in my cigar.

Notice that the two words that exchanged, briefcase and cigar, are both nouns, but they came from different phrases, the cigar and in my briefcase. The fact that the words exchange is thought to be due to the lack of phrase organization at this level; only the specification of grammatical function is taking place. Consequently, the error maintains the correct grammatical class but has the nouns swapped around.

7. Evidence for a second stage of grammatical encoding
Errors indicative of a breakdown at the positional level are quite different. They fall within a phrase (e.g., spoonerisms, like heft lemisphere for "left hemisphere", Fromkin, 1973) and involve words of different grammatical categories here, an adjective and a noun. So, it seems that these sound exchanges arise at the point in time when the positions of words are being organized. Another piece of evidence for this positional level is that the grammatical elements in the planning frame do not move but are stranded and join onto the content words which exchanged earlier on, as shown in the following example: she's already trunked two packs for "packed two trunks" (Garrett, 1975). In this example, the inflections, -ed and -s, were stranded when pack and trunk swapped places in the speech error. More recently, there is evidence that perhaps the function words in planning frames can participate in errors, too, thus behaving differently to inflections (which are stranded, only). Bock (1989) argued this when she found priming could occur between different function words. (By priming is meant faster times to respond when the stimulus is preceded by a similar item, called the prime.) The prime sentence raises the activation level of the corresponding words in the stimulus sentence. If items prime others like them in category but different in form, presumably they are doing so at a similar level of processing. That level must be more abstract than the planning-frame stage, for the frame contains the forms of words which will be subsequently pronounced.

8. Feedforward models
All models have these two levels of functional and positional encoding. There is a difference in opinion, however, about how they operate together. Garrett's model originally had information feed forward from one stage to the next, with the output of one supplying the input to the next. Levelt's team (Levelt, Roelofs & Meyer, 1999) have elaborated on that model and determined what the stages involve more explicitly. This sort of system is consequently called a feed-forward one and the lack of feedbackward is due to information encapsulation, which is what characterizes a modular system: Garrett believed that the different types of information in a modular system for example, grammatical or phonological are independently handled at different stages, so that one type cannot influence another.

9. Interactive models
Researchers who do not support the idea of modularity have sought to show some interaction between levels. One such approach is the model by Dell (1986) and colleagues, where the levels do interact, resulting in both feed-forward and feed-backward effects. This sort of model is consequently called interactive. What it means in practice is that information at one level, once it is activated, can raise the activation level of information at other levels. Take, for example, how common a word is in the language. The brain accesses the forms of words stored in long-term memory faster if they are more commonly used in the language. This time difference is called a frequency effect. Dell believes that frequency can have an effect at all levels of speech production, not just at the level where specific phonological forms are retrieved from memory. Nevertheless, he still has in his model separate processes for selection of the lexical unit with its grammatical function, the lemma, and retrieval of its corresponding phonological form, called the lexeme.

10. Incremental planning
Another aspect upon which all models agree is that the planning that precedes speech production takes place incrementally (Kempen & Hoenkamp, 1987). Our speech is too fluent for our brains to be planning a whole sentence at a time, with each level having to be complete before the next one begins. Rather, what is thought to happen is that incremental fragments are planned in parallel across the different levels. Consequently, the speaker does not necessarily know how his/her sentence will finish when it begins. So the issue for research is: what amount or extent of planning occurs at each level? Researchers talk about the scope of planning, meaning the unit involved, such as a clause, a phrase or a word/s. With the idea of incremental production, it is possible, in fact likely, that the scope of planning is different for the different levels.

11. Phonological planning
The amount of phonological planning was demonstrated in the first use of moving pictures as a research paradigm, by Levelt and Maassen in 1981. The participants viewed three pictures on a computer monitor, in which one or both moved relative to the remainder, and had to describe the change. Levelt and Maasen first established that the participants preferred to describe the movement of two pictures as, "the circle and the square move up", rather than as, "the circle moves up and the square moves up", even though it took longer to prepare the conjoined phrase. What is relevant to phonological planning was that varying the difficulty of the name only had an impact on the first, not the second, name of the double-noun subject. The conclusion was that retrieval of the phonological form seemed to precede speech by only one content word. Because of the way sounds exchange between words in speech errors, and how pronunciation is adapted to the context, one word is probably an underestimate. Phonological encoding of words seems to take place sequentially (e.g., Meyer, 1990, 1991; Wheeldon & Levelt, 1995); so the number of words planned ahead of speech would have to be limited maybe only a couple of words in advance.

12. Scope of planning
Information on pauses (Beattie, 1980) and speech errors (Garrett, 1975), as well as experiments using picture description, as above, indicate that grammatical encoding in unimpaired speakers takes in a whole clause. Phrase-level planning is not ruled out by the data, however (Smith & Wheeldon, 1994). The moving pictures paradigm was used again recently by Smith and
Wheeldon (1999) to try to settle the confusion over the clause- versus phrase-level scope of grammatical encoding. After controlling for perceptual influences and aspects of speech rhythm, they found that the lemma-retrieval processes were happening in parallel fully one phrase into an utterance (both nouns retrieved together), and had begun on the second phrase. Thus, a larger unit of clause-level planning was going on at the message level and a smaller unit of phrase-level planning at the functional level. What is happening in syntactic planning seems to be clause-level, following Ferreira's (1991) study. Ferreira (2000) now concludes that planning may differ in extent, depending upon the demands of the task, being least demanding in spontaneous speech (hence more extensive) and most demanding in timed experiments (hence restricted in extent, in a trade-off with reaction time). It is Ferreira's (2000) belief, also, that when there is no pressure on peak performance, retrieval of the verb and its dependent structure (subject and object, say) are the extent of grammatical encoding.

13. Evidence from aphasia

13.1 Scope of planning
Martin and Freedman (2001) have adopted the moving pictures paradigm for use with aphasic patients to explore their ability to hold onto word-level semantic information during planning. Their items began either with a simple or a complex noun phrase, as in "The ball moves above the tree and the finger" and "The ball and the tree move above the finger" to equate length and content. Their aphasic patient, ML, has a specific sort of short-term memory deficit in which he has trouble retaining in working memory the word-level (lexical-) semantic information before it is integrated into sentence meaning. He has no difficulty understanding the meaning of those elements nor with handling the grammar. Age-matched controls had an even bigger voice-onset delay for producing initial complex phrases compared to simple ones than Smith and Wheeldon's (1999) student participants. ML's effect, however, was twenty times as large as the average of controls' and eleven times as large as the slowest control's. He was also error-prone. Thus, he had great difficulty producing complex phrases compared to simple ones if they began the sentence, even though he has normal reaction times to visual stimuli and normal naming ability for the pictures. Another patient, with a short-term memory deficit for speech-sound information rather than lexical meaning, showed a similar pattern to controls, without the inordinantly long onset time for complex initial phrases. Thus, phonological STM deficits do not seem to impinge upon the phrase planning that this moving pictures task seems to tap.

Another way in which aphasic data are helping to clarify the scope of planning at different levels of speech production is in picture description. Patients with lexical-semantic STM deficits have greater difficulty producing descriptions with several adjectives before the noun, compared to the same words following the noun. In the first situation, preceding the noun (e.g., The rusty, old, red pail….. ), the adjectives have to be held in working memory until the noun to which they attach is retrieved. Patients with lexical-semantic STM deficits experience greater difficulty retaining the information in this unintegrated form (resembling a list), compared to where they follow the noun (e.g., The pail is rusty, red and old), and hence can modify the noun immediately. The same disadvantage applies to strings of nouns which precede a verb, as a complex noun-phrase subject; for example, The pitcher, the vase and the mirror cracked …… It is easier for these patients to produce …..cracked the pitcher, the vase and the mirror, even though the same numbers and classes of words are used in the two versions. Even more stark is the difference in their production of phrases with adjectives preceding the noun, compared to naming the words on their own. From no difficulty naming the pictures as single words, they struggle to produce double adjectives in front of the noun to form a phrase; for example, long, curly hair (Martin & Freedman, 2001). Yet, patients with a STM deficit that is phonologically based produce these phrases normally. So, when semantic load is heavy, patients with a lexical-semantic STM deficit are adversely affected. It is not a problem with producing the words themselves, but with producing them when they have to be held briefly in memory before integration into the phrase or sentence. What has been learnt from these patient data is that there must be a role for such a memory system in comprehending sentences with a lot of content. In other words, not just speech sounds need to be held temporarily in memory, but also the meaning content of the words.

13.2 Two stages of lexical retrieval
Another contribution of patient data is in contributing to our understanding of the role of pauses in sentence production. The patients used for this sort of research had a type of aphasia called Wernicke's aphasia. These patients produce speech that is full of jargon words that are meaningless. Yet, the patients speak fairly fluently, especially compared to the agrammatic, Broca's aphasia patient you heard earlier.

Because of the fluency of these patients' speech, it was thought that they did not have any problem with producing speech sounds. Yet they do have a marked problem retrieving the sound-based form of English words. They either retrieve a made-up word that bears no resemblance to English, or they retrieve some of the information so that the resemblance is noticeable. Consequently, these patients were good candidates for finding out whether pauses in their speech were used for retrieval of the phonological forms. Brian Butterworth (1979) is a researcher who has contributed much of this information. He found that pauses seemed to occur more often in front of words that were in error than in front of appropriate words. These longer pauses were attributed to the time taken to search the phonological output lexicon for the word form. How close the word produced was to the target was a measure of how successful the search of the lexicon was. Patients with Wernicke's-type aphasia in which the words are neologistic, not like any English word, may still keep appropriate inflections on words. So, the root is unsuccessfully retrieved in searching the lexicon for the content word but the inflection came with the planning frame in grammatically encoding the sentence, as you may recall from Garrett's model. Thus, these stranding errors converge with the evidence from slips of the tongue to support the separation of retrieval of content words and of the planning frame containing inflections, at least.

There are many more instances of aphasia research's contribution to our understanding of sentence production. If you pursue this subject matter further, some groundwork in linguistics would be an asset.