An Introduction to Applied Linguistics


Download 1.71 Mb.
Pdf ko'rish
bet45/159
Sana09.04.2023
Hajmi1.71 Mb.
#1343253
1   ...   41   42   43   44   45   46   47   48   ...   159
Bog'liq
Norbert Schmitt (ed.) - An Introduction to Applied Linguistics (2010, Routledge) - libgen.li

Table 6.1 The 50 most frequent words in the Michigan Corpus of Academic Spoken English 
(MICASE)
*‘Register’ is the term we are using to describe varieties of texts that are defined by situational characteristics 
(for example, spoken versus written, edited versus online production). Registers can be described at various 
levels of specificity. For example, spoken language versus written language constitute two broadly defined 
registers. A subcategory of the register of written language is the register of academic textbooks. It is also 
possible to further divide the category of academic textbooks according to discipline (such as biology, 
business, education, art history, etc.) or by level (undergraduate, graduate, freshman, sophomore, etc.).
N
Word
Frequency
N
Word
Frequency
1
THE
68,036
26
BE
8874
2
AND
41,091
27
THEY
8799
3
OF
35,053
28
ON
8650
4
YOU
34,986
29
ARE
8596
5
THAT
34,085
30
IF
8440
6
TO
33,029
31
YEAH
8292
7
A
32,236
32
WAS
8179
8
I
31,483
33
JUST
7970
9
IS
23,535
34
DO
7675
10
IN
23,255
35
NOT
7638
11
IT
21,883
36
OR
7488
12
SO
17,669
37
THAT’S
7042
13
THIS
17,110
38
ABOUT
7014
14
UM
15,346
39
RIGHT
6980
15
UH
14,859
40
WITH
6726
16
HAVE
11,590
41
CAN
6350
17
IT’S
11,560
42
AT
6312
18
WE
11,383
43
AS
6229
19
WHAT
11,236
44
THERE
5991
20
LIKE
11,037
45
THINK
5796
21
BUT
10,402
46
DON’T
5650
22
KNOW
10,000
47
XX*
5646
23
FOR
9282
48
THEN
5443
24
ONE
9267
49
ALL
5289
25
OKAY
9250
50
TWO
4937
*Note: (xx) is the convention used to indicate unintelligible speech.


97
Corpus Linguistics
N
Word
Frequency
2039
ABSOLUTE
50
2040
BECOMING
50
2041
CAUSED
50
2042
CHARACTERISTIC
50
2043
CLASSROOM
50
2044
CONSISTENT
50
2045
CORE
50
2046
CURVES
50
2047
DAILY
50
2048
DESCRIPTION
50
2049
DETECT
50
2050
DISSERTATION
50
2051
EXECUTION
50
2052
EXPOSED
50
2053
FIGURED
50
2054
GARDEN
50
2055
GRAVITY
50
2056
HABITAT
50
2057
OPENING
50
2058
PAGES
50
2059
PHRASE
50
2060
PRESENTED
50
2061
RAISED
50
2062
RANDOMLY
50
2063
REGIONS
50
2064
REVELATION
50
2065
SELECTION
50
2066
SHORTER
50
2067
SHUTTLE
50
2068
SPLIT
50
2069
SURVEY
50
2070
TAIL
50
2071
THEORETICAL
50
2072
TRAITS
50
2073
TUMOR
50
2074
WHOA
50
Table 6.2 Words with a frequency of 50 in MICASE


98 An Introduction to Applied Linguistics
Word lists derived from corpora can be useful for vocabulary instruction and 
test development. For example, a word list from an appropriate corpus could be 
used to select vocabulary words occurring within a specified target frequency 
range – say words occurring five to ten times per million words – to be included 
in a course syllabus or pool of test items. Similarly, a teacher trying to decide what 
modal verbs to teach and what sequence to teach them in could consult a wordlist 
from one or more corpora to find the relative frequencies of the modals.
In addition to frequency lists, concordancing packages can provide additional 
information about lexical co-occurrence patterns. To generate a concordance 
listing showing these patterns, a target word or phrase needs to be selected. Once 
the search word/phrase is selected, the program can search the texts in the corpus 
and provide a list of each occurrence of the target word in context. This display, 
referred to as a ‘key word in context’ (KWIC) may then be used to explore various 
uses or various senses of the target word. Figure 6.1 shows a screen shot of a KWIC 
for the target word like from a small corpus of spoken children’s language.
The top portion of the screen display provides context for the occurrence of like 
that is highlighted in the lower portion of the screen. The size of the windows and 
the amount of context can be adjusted, allowing users to adjust settings according 
to their needs. This small KWIC display of like shows that the students (fifth-
graders) engaged in informal conversations were primarily using like as a verb and 
that it was often preceded by a personal pronoun and followed by an infinitive 
(for example, we like to talk, we like to walk, I don’t like to listen). Of course, this 
small display does not show all of the occurrences of like; other uses do occur in 
the corpus.
A concordance program can also provide information about words that tend 
to occur together in the corpus. For example, we could discover which words 
most frequently occur just to the right or just to the left of a particular target 
word, or even within two or three words to the left or right of the target word. 
Words that commonly occur with or in the vicinity of a target word (that is, with 
greater probability than random chance) are called ‘collocates’, and the resulting 
sequences or sets of words are called ‘collocations’. An analysis of collocations 
provides important information about grammatical and semantic patterns 
of use for individual lexical items (see Sinclair, 1991 for more information on 
collocations).
Through the use of corpus analyses we can discover patterns of use that 
previously were unnoticed. Words and grammatical structures that seem 
synonymous often have strong patterns of association or preferences for use 
with certain structures. For example, the nearly synonymous verbs begin and start 
have the same grammatical potential. That is, they can be used with the same 
variety of clause elements (for example, transitive, intransitive). Yet from corpus-
based investigations we have learned that start has a strong preference for an 
intransitive pattern, in particular in academic prose (Biber, Conrad and Reppen, 
1998). A detailed example of nearly synonymous words is provided later in this 
chapter in the section on ‘Examples of Corpus-based Classroom Activities’ and in 
the ‘Hands-on Activity’ at the end of this chapter.
Lexical phrases, or lexical bundles, is another area of collocational studies that 
has come to light through corpus linguistics. Like collocations, these lexical 
phrases or bundles are patterns that occur with a greater than random frequency 
(see Chapter 1, An Overview of Applied Linguistics, for an example). The Longman 
Grammar of Spoken and Written English (Biber et al., 1999) provides a good discussion 


99
Corpus Linguistics

Download 1.71 Mb.

Do'stlaringiz bilan baham:
1   ...   41   42   43   44   45   46   47   48   ...   159




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling