Lecture 9 The story of corpus linguistics, from past to future


Download 1.64 Mb.
Sana27.04.2020
Hajmi1.64 Mb.

LECTURE 9 The story of corpus linguistics, from past to future

  • Corpus Linguistics
  • Prof. Djumabaeva J.

OUTLINE

  • History of corpus linguistics
  • Revisiting old friends: computational linguistics
  • The textually mediated world: the humanities and social sciences

History of corpus linguistics

  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Will there be a distinctive third phase in the ongoing development of corpus linguistics?

  • Corpus methods are fully embedded in the day-to-day practice of
  • functionalist linguistics
  • sociolinguistics
  • discourse analysis and so on
  • An important role for corpus specialists
  • methodology – the construction and annotation of corpora
  • development of new tools and new procedures
  • expansion of the conceptual bases of the methodology and other such issues
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

What do we mean by “corpus linguist”?

  • ‘corpus linguist’ –someone who uses corpus data in their research
  • a researcher into the methodology, especially one who develops new methods and enables other linguists to apply them
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Two other directions of development for corpus linguistics

  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Revisiting old friends: computational linguistics

  • What is Computational linguistics?
  • At the most theoretical level - developing computational models of the language system
  • Practical level - developing software
  • … software may be developed to analyse language input, as in speech recognition, or syntactic or semantic parsing; or to produce a language output. In cases such as machine translation – automatic conversion of text in one human language to another (see Somers 2003 or Nirenburg et al. 2003 for an overview) – processing of both language input and language output is involved.
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Extracting information from a text or texts

  • entity extraction
  • text mining (see Feldman and Sanger 2007)
  • data mining – the identification of patterns and extraction of information across very large datasets
  • biomedical text mining (see Cohen 2010) – that is, extracting information from large collections of text (usually academic papers) on biology or medicine
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Corpus linguistics vs computational linguistics

  • Corpus linguistics is ultimately about finding out about the nature and usage of language. While computational linguistics may also be concerned with modelling the nature of language computationally, it is in addition focused on solving technical problems involving language (McEnery, Corpus Linguistics, 2007)
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Corpus linguistics vs computational linguistics

  • Computational linguistics is an ‘old friend’ of corpus linguistics, in that they have been and continue to be linked (not least, perhaps, by the somewhat inaccurate perception of those outside the fields of a greater similarity between them than actually exists); but it is a friendship which needs to be renewed and reinvigorated if both sides are to get the most out of the link.
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Sentiment analysis

  • opinion mining: see Liu 2010
  • The main aim of sentiment analysis may be characterised roughly as the automatic identification of what a writer feels about the topic of the text they are writing (or, alternatively, their opinion of that topic matter).
  • positive or negative opinion
  • pragmatic implicature
  • a negative opinion to be conveyed without any straightforward negative expressions such as bad, awful, very poor, I hate X and so on
  • The key point for our purposes is that sentiment analysis has had little or no impact on the field of corpus linguistics, in spite of some fairly obvious uses for it in discourse analysis and pragmatics.
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

The textually mediated world: the humanities and social sciences

  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

The textually mediated world: the humanities and social sciences

  • Humanities research exploiting corpus tools and resources is a subset of the field of humanities computing, or digital humanities as it is often known nowadays (see McCarty 2005). Digital humanities research includes the development and exploitation of many forms of database, not just corpora. For example, work has been done to create databases of images (Bailey 2010) and of archaeological objects (Heath 2010).
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

EEBO

  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Recommended sources

  • Bailey, C. 2010. ‘Introduction: making knowledge visual’, in C. Bailey and H. Gardiner (eds.) Revisualizing Visual Culture, pp. 1–11. Farnham: Ashgate.
  • Cohen, K. B. 2010. ‘BioNLP: biomedical text mining’, in N. Indurkhya and F. J. Damerau (eds.) Handbook of Natural Language Processing (second edition). Boca Raton, FL: CRC Press.
  • Feldman, R. and Sanger, J. 2007. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
  • Heath, S. 2010. ‘Diversity and reuse of digital resources for ancient Mediterranean mate- rial culture’, in G. Bodard and S. Mahony (eds.) Digital Research in the Study of Classical Antiquity, pp. 35–52. Farnham: Ashgate.
  • Liu, B. 2010. ‘Sentiment analysis and subjectivity’, in N. Indurkhya and F. J. Damerau (eds.) Handbook of Natural Language Processing (second edition), pp. 626–66. Boca Raton, FL: CRC Press.
  • McCarty, W. 2005. Humanities Computing. Basingstoke: Palgrave Macmillan. Maclagan, M., Davis, B. and Lunsford, R. 2008. ‘Fixed expressions, extenders and metonymy in the speech of people with Alzheimer’s disease’, in S. Granger and F. Meunier (eds.) Phraseology: An Interdisciplinary Perspective, pp. 175–90. Ams- terdam: John Benjamins.
  • Nirenburg, S., Somers, H. and Wilks, Y. (eds.) 2003. Readings in Machine Translation. Cambridge, MA: MIT Press.
  • Somers, H. (ed.) 2003. Computers and Translation: A Translator’s Guide. Amsterdam: John Benjamins.
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent
  • Thank you very much
  • If you have questions feel free to contact
  • Via Telegram +998909624440
  • E mail: djumabaevajamila@gmail.com
  • Dr. Djumabaeva J., English Philology Department, NUU, Tashkent

Download 1.64 Mb.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2020
ma'muriyatiga murojaat qiling