Available at

bet	9/61
Sana	18.06.2023
Hajmi	1,62 Mb.
	#1559231

1 ... 5 6 7 8 9 10 11 12 ... 61

Bog'liq
bbbb

Parliament Interpreting Corpus), an electronic parallel corpus of source and target
speeches in Italian, English and Spanish (Monti et al., 2005; Bendazzoli &
Sandrelli, 2005). The plenary sittings of the European Parliament were chosen as
source material in this corpus because they show a high level of homogeneity, in
that all speeches are produced in the same formal setting (Marzocchi & Zucchetto,
1997). EPIC allows both parallel and comparable analyses and contains about
180,000 words. It is available online for the whole interpreting community to help
to share our knowledge of interpreting and even enhance its teaching.
As for
DIRSI, i.e. Directionality in Simultaneous Interpreting (Bendazzoli & Sandrelli,
2009), it includes interpreters’ output into both their native language and their
foreign working language. This corpus was compiled thanks to audio recordings
from international conferences about health-related subjects held in Italy between
2005 and 2008 and includes recordings from different sessions (i.e. opening
statements, presentations, and closing sessions). The language pair is English

Corpus-based interpreting studies

page 16
and Italian and five professional interpreters have accepted to contribute data.
Debates were excluded from the corpus due to their degree of interactivity. The
creation of this corpus has been possible thanks to the experience previously
gained with the EPIC project.
It is important to add that interpreting corpora created ad hoc by individual
researchers for manual analysis are still used today, and complement the realm of
CIS. The notion of corpus in interpreting studies has clearly initially been linked to
empirical research based on authentic data (i.e. from real-life interpreting
assignments) and because of the difficulty to compile electronic corpora that
notion still applies to data sets that continue to be analysed manually and not
electronically.
In the following paragraphs, I will describe the main characteristics of interpreting
corpora (i.e. interpreting mode and setting, corpus size, languages and data
accessibility) that have already been compiled recently, and that are currently
being compiled.
The first CIS projects focused on professional simultaneous interpreting
performed in conference settings. Two specific sources of data have been
predominant, namely TV broadcasting and the European Parliament (EP). At the
EP, for instance, source speeches are interpreted simultaneously into as many as
23 languages (sometimes through relais interpreting) and these speeches can be
used for research purposes (Bendazzoli, 2010). Other fields have yet been
explored such as festivals, medical conferences, and football press conferences.
Simultaneous interpreting is nevertheless not the only interpreting mode that can
be analysed. Asian research centres focus more on consecutive interpreting
because of data accessibility, their data source being televised press conferences
of Chinese political representatives (Wang, 2015). More recent projects also focus
on short consecutive interpreting in community settings or on dialogue interpreting.
It is also important to note that efforts are being made to develop sign-language

Corpus-based interpreting studies

page 17
corpora despite the difficulty to collect data due to anonymity issues (see Metzger
& Roy, 2011).
If they are compared to the spoken part of the British National Corpus (10 million
words), interpreting corpora are quite small (Dembry & Love, 2015). Projects
based on EP data still have to reach the size of general reference corpora, and it
might just be a matter of time and of labour force. However, it is hoped that CIS
projects will develop in other international organizations than the EP to diversify
interpretation settings. It is already considered in some organisations such as the
European Commission (Spinollo, 2018; Scardulla, 2016). Even though current
projects carried out in Asia are expected to generate pretty large resources,
compiling very large corpora in the near future is not likely to happen.
In terms of languages used, we can see a wide range of language combinations,
which confirms one of the “special challenges” (Setton 2011: 68) of CIS -
multilingualism. English is represented in many studies, but it is really encouraging
to see that non-European languages such as Hebrew, Japanese or Chinese are
represented as well.
The last feature of interpreting corpora I would like to focus on is data accessibility,
which has always been an issue in CIS. In the oldest studies mentioned in
Setton’s (2011) overview (see Appendix 1), transcripts were rarely made available
and sound files were recorded on tape and not in digital form. Transcript files were
thus hard to access (Diriker, 2004) and the analysis had to be carried out manually.
In the same period, some studies were however based on machine-readable
corpora (Cencini, 2002; Fumagalli, 2000) but the transcripts were “not available for
outside use” (Setton 2011: 40). In some other cases, transcripts are available on
CDs (e.g. Vuorikoski, 2004; Monacelli, 2009) or on the web as it is the case for the
two corpora I mentioned earlier, EPIC and DIRSI. In the future, data accessibility
should be facilitated at least among the research community.

Corpus-based interpreting studies

page 18

Download 1,62 Mb.

Do'stlaringiz bilan baham:

1 ... 5 6 7 8 9 10 11 12 ... 61