Daniel A. Nkemleke Ecole Normale Supérieure University of YaoundeI
Daniel A. Nkemleke Department of English Ecole Normale Supérieure University of YaoundeI
The study of language based on examples of “real life“ language use, collected, stored and processed via computer Facilitated by the advent of computer technology (1960s) Latin: corpus (body): body of text any collection
Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology (“Primitive corpora?“) Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology (“Primitive corpora?“) Between 1960s and 1980s: minority of linguists continued working on corpus-based work (Quirk: SEU, Francis & Kucera: Brown corpus, Svartik: London-Lund corpus) Computer technology: major support for CL First African Corpus: 1989 (ICE-East Africa) (Schmied 1989) Second African Corpus: 1992 CCE (Tiamajou 1993)/ Nigeria??
“Thirty years ago when this research started it was considered impossible to process texts of several million words in length. Twenty years ago it was considered marginally possible but lunatic. Ten years ago it was considered quite possible but still lunatic. Today it is very popular“ (Thomas/Short 1996: 4)
L1 Corpora L1 Corpora Brown Corpus of American English Lancaster-Oslo/Bergen Corpus (LOB) London-Lund Corpus British National Corpus (BNC) Birmingham Corpus of British English L2 Corpora ICE-East Africa (Kenya & Tanzania) Corpus of Cameroon English Corpus of Nigerian English ?? Kolhapur Corpus of Indian English Multinational Corpus Project International Corpus of English (ICE)
1. Sampling & representativeness Attempts to construct a “representative” sample corpus Which maximally represents variety Aim: picture as accurate and reasonable as possible of a language population
2. Finite size Body of finite amount of words, e.g. 1,000,000 Figure determined at beginning of project monitor corpus: constant addition of texts
3. Machine-readable form 3. Machine-readable form Past: reference to printed text Few in book form (e.g. original London-Lund) Occasionally other forms of media (microfiche, recordings)
4. Standard reference 4. Standard reference Tacitly a corpus constitutes a standard reference Presupposition: wide availability to other researchers Direct comparison of results with other varieties
Began in 1992 with the collaboration of two Began in 1992 with the collaboration of two British universities (Birmingham/Liverpool) Assistance of the British council in Yaoundé Target of a million words reached in 1994 Data use for classroom activities/research since then → Goal: Further development (tagging) of the database (TU-Chemnitz)
Provide authentic data for the description of the main features and problems inherent in the variety of English which is written in Cameroon Provide a source of authentic material for English language teaching/learning in Cameroon Serve as a database for comparative studies on CamE in relation to other varieties of English
Dialogues Dialogues 1. Conversations 2. Phone calls 3. Broadcast discussions 4. Classroom lessons 6. Parliamentary debates 7. Legal cross- examination 8. Business transactions
13 possible ways in which a corpus may be useful 13 possible ways in which a corpus may be useful 1. Corpora as a source of empirical data 2. Corpora in language teaching and learning 3. Corpora in Lexical studies 5. Corpora in speech research 6. Corpora and semantic studies 7. Corpora in pragmatic and discourse studies 8. Corpora in sociolinguistic studies 9. Corpora and stylistic studies 10. Corpora in historical linguistics 11. Corpora in dialectology and variational studies 12. Corpora in Psycholinguistics
Linguists can make more objective statements on language use in the variety, comparing other varieties Nkemleke /Mbangwana (2001) Nkemleke (2003) Nkemleke (2004a, 2004b) Nkemleke (2005) Nkemleke(2006) Nkemleke (2007a, 2007b) Nkemleke(fc: 2008a, 2008b, 2008c) Schmied/Nkemleke (fc:2008a, 2008b) A number of post-graduate projects in ENS/Faculty
CCE data used for classroom activities over the years CCE data used for classroom activities over the years
Support teachers’ classroom explanation Learner’s as researchers Data-driven learning Critical look at existing language teaching material
CCE data used for studies on aspects of Cameroon English usage, E.g. Hans-Georg Wolf used data from the corpus in his book English in Cameroon, published in 2001 by Mouton de Grouter (Berlin/New York).
Keep informed about new words, changing meanings Call up word combinations, co-occurring words
ICE-Cameroon is on-going Future possibility of more specialized corpora
Do'stlaringiz bilan baham: |