查看原文
其他

会议综述| 2017年国际第九届语料库语言学大会综述(伯明翰大学)

2017-08-27 语言学通讯 语言学通讯



会议综述

The 9th International Corpus Linguistics Conference 


关注



第九届国际语料库语言学会议于2017年7月在英国伯明翰大学召开。国际语料库语言学会议每两年召开一次,英国的语料库语言学重镇兰卡斯特大学、利物浦大学、伯明翰大学等学校都曾先后举办此盛会。本次大会的主旨发言人有英国伯明翰大学Susan Hunston教授,英国兰卡斯特大学Andrew Hardie博士,英国阿斯顿大学Mike Short教授,德国弗莱堡大学Christian Mair教授,美国波特兰州立大学Susan Conrad教授和英国哈德斯菲尔德大学Dan Mcintyre教授






Susan Hunston

发言题目:Corpus Linguistics in 2017: a personal view


This paper offers a personal reflection on the development of Corpus Linguistics over the last couple of decades. This time period has seen a massive increase in the amount of research in this field, while at the same time reflecting three persistent preoccupations: the primacy of comparison; the primacy of lexis; and an ongoing questioning of the relationship between theory and methodology. I shall suggest that the major changes and developments in the field can be summarised in terms of five ‘turns’: the quantitative; the cognitive; the modality; the specialisation; and the paradigmatic. The paper will discuss each of these in turn and will offer some speculations about the future.






Andrew Hardie

发言题目:Exploratory analysis of word frequencies across corpus texts: towards a critical contrast of approaches


A recent trend in corpus linguistics is the adoption of Latent Dirichlet Allocation (LDA), already widely used by digital humanists (Blevins, 2010; Underwood, 2012) as a method for exploratory corpus analysis. LDA is a machine-learning approach to inducing structure in the content of a corpus based solely on word occurrence across texts or documents as data objects, one of a range of approaches usually if potentially misleadingly dubbed topic modelling. However, adopting this approach to the many-dimensional data of word frequency comes with a high price tag in terms of knowledge that the system ignores or makes nontransparent. The question this raises is whether that price tag is justified. Various advantages have been asserted for LDA, albeit not without caveats (see Blei, 2012 for a selection of both). All such advantages notwithstanding, LDA has at least three substantive disadvantages. First, it is non-deterministic:randomisation is central to the algorithm. This is problematic from the perspective of scientific replicability for reasons too obvious to belabour. Second, its operation is opaque: the relationship between the underlying distribution data and the resulting statistical model is nontransparent to the analyst. Third, the theory of text generation underpinning the LDA algorithm is dubiously compatible with linguistic understandings of text, topic and discourse. Moreover, although the lack of linguistic knowledge used in the construction of the model is presented as an advantage of LDA, this is equally characterisable as a disadvantage: the field of corpus analysis has invested much effort in the creation of precisely the knowledge resources which LDA is lauded for not requiring. What exactly does our acceptance of these disadvantages buy us? In examining this issue, we must venture comparisons to longer-established exploratory multivariate analysis approaches that are longer-established in corpus linguistics (cf. Biber, 1988, 1989). Using example data drawn from the FLOB corpus, I will compare and contrast outcomes of different analytic procedures including LDA models and alternative approaches, with two questions in mind. First, to what extent are these outcomes compatible with one another? Second, to what extent are they transparently interpretable in linguistically meaningful terms?







Mike Short

发言题目:News Downloads and Aboutness


Many of us are using LexisNexis, Factiva or other online sources, often in order to study a specific topic within such overall fields as gender studies, journalism, history, sociology, medicine, psychology, law. Among the issues raised by such downloads as supplied by an online search engine, there are choice of search-terms, duplicate articles, repeated sections within articles, online comments and discussion, disparities in formatting. But the main aim of the presentation is to focus on the problem of relevance: many of the articles retrieved may have a merely incidental mention of the desired topic. The main aboutness of such articles doesn’t really include the topic but concerns another, quite different one. For example, an article returned by a search on Brexit (Guardian, 12 January 2017) which concentrates on problems in the UK’s the National Health Service, contrasting these problems incidentally with the “theoretical risks of Brexit” and claims deficiencies in the Health Service are very obvious to ordinary voters.Its aboutness does include Brexit but at a very minor level. The question we will be considering is then, how do we filter aboutness so as to reduce unwanted dross? There are various aspects of relevance to identify in order to find ways of filtering out irrelevance. One concerns identifying carefully what we are really seeking in the first place, since almost any topic such as climate change, austerity, Brexit has numerous aspects (legal, social, geographical etc.), some of which are more central (within the field of knowledge) than others (gardening, hill- walking, DIY). Once it is clear which aspect of our topic is wanted, means have to be found to get rid of the others. Easier said than done!

Christian Mair

发言题目:Downsizing and upgrading: Why we need more spoken, more multilingual and more nonstandard corpora


Today, students of English (and a few other mostly European languages) are privileged in that they can rely on extremely rich corpus-linguistic working environments. In a brief review of 50 years’ corpus-linguistic research, I will demonstrate how the availability of increasingly large corpora and increasingly sophisticated tools for analysis has left a profound mark on the discipline of linguistics. Traditional descriptive work can now be carried out to higher empirical standards. More importantly, new areas of linguistic inquiry have been opened up to rigorous empirical investigation, and corpus-based research has given a general boost to usage-based theoretical frameworks of all kinds. As I will show, however, the story of the past fifty years has not been one of undiluted progress and success. It seems that a “conspiracy” of technological and ideological factors has favoured the creation of large monolingual standard written corpora. Data which does not fit this template tends to be made to conform to it. For example, much corpus-based work on spoken English is based on transcriptions rather than the original audio or audiovisual recordings. Similarly, complex multilingual realities tend to be simplified in corpus-compilation, for example by annotating code-switches into other languages as “extra-corpus material.” Today, corpus technology and corpus-linguistic the orising have advanced to such an extent that these biases can and should be redressed. In the digital textual universe in which the humanities and social sciences are all operating today, the classic definition of the corpus, as a usually digital database compiled by linguists for the purposes of linguistic analysis, has become increasingly difficult to uphold and corpus-linguistics will sooner or later merge with the digital humanities movement. A kind of corpus-linguistics which emphasises spoken, multilingual and nonstandard data more than has been the case in the past will make a richer contribution to this development.






Susan Conrad

发言题目:From a Plate of Spaghetti to a Cable-stayed Bridge: Increasing the Impact of Corpus Linguistics in Disciplinary Education


In the 1980s, John Sinclair was instrumental in showing the profound impact corpus linguistics could have on our understanding of language. Now, ten years after his death, I want to urge corpus linguists to think again about having an impact – this time on fields that most people don't associate with language study, such as engineering. Why does an engineer need corpus linguistics? How can corpus-based studies improve engineering education? What does it take to move from language descriptions to applications that encourage changes in what people do? What challenges face corpus linguists in working with professionals who don’t “speak linguistics”? These are the general questions I will address, using my work in the Civil Engineering Writing Project as a concrete example. Begun in 2009, the Civil Engineering Writing Project is a corpus-based project that addresses a long-standing problem in engineering education: students' lack of preparation for writing in the workplace. Despite decades of discussion, there had been almost no empirical investigation of the problem in the United States. I immediately saw the role corpus linguistics could play in defining the problem, informing teaching materials, and assessing improvements. The project materials have now been piloted at four universities, with significant improvements in students’ writing. My talk will include examples of the corpus-based analyses of words and grammar that helped us understand the gaps between student and practitioner writing. The analyses have, for example, clarified the highly controversial areas of passive voice and first person pronoun use, and highlighted the importance of clausal simplicity and certain word choice issues. They demonstrate that language choices are fundamental to effective engineering. However, the linguistic analyses have also become intertwined with techniques that are less typical in corpus studies. We maintain ongoing collaborations with professionals in the community, to mine their context expertise and get their help interpreting the linguistic findings. We interview students to gain insight into reasons behind their language patterns – insights that no amount of corpus analysis can reveal. We have made additions to the research methodology to include judgments of writing effectiveness, a transition from description to evaluation that is necessary for an applied project. And we are constantly seeking new ways of turning corpus analyses into information and practice that engineers value. Although the additional techniques increase the complexity of the project, I argue in this talk that expanding corpus research in these ways can make it more useful in more disciplines. I will reflect on the successes and the continuing challenges of the project. How exactly the plate of spaghetti and the cable-stayed bridge figure in – well, that will become clear in the talk






Dan McIntyre

发言题目:Just what is corpus stylistics?


Over a relatively short period of time, corpus linguistic methods have been embraced by a wide range of sub-disciplines of linguistics (and, more recently, by other disciplines entirely). Corpus linguistics has had a transformative effect on such areas as historical linguistics, child language acquisition, and critical discourse analysis, to name but a few. In stylistics, corpus methods are increasingly being adopted, not least because of the influential work of corpus linguists such as Stubbs (2005) and Mahlberg (2013). Indeed, such is the popularity of the corpus approach in stylistics that it is now common to see the term corpus stylistics used to describe any stylistic work that utilises corpus methods. This adoption of corpus as a premodifier to designate a particular type of stylistics is unusual when compared against the practices of other sub-disciplines that use corpus methods. So just what is corpus stylistics and how, if at all, does it differ from corpus linguistics? My talk aims to offer answers to these questions by exploring how stylisticians have used corpora in their work. I begin with an overview of research in corpus stylistics before going on to consider issues with the presuppositions inherent in some definitions of the term. I then discuss topics in stylistics that have benefitted particularly from corpus methods. These include the analysis of speech and thought presentation (e.g. Semino et al., 1997, Semino & Short, 2004), where corpora have enabled the discovery of quantitative as well as semantic norms. Following this, I consider the washback effects that corpus linguistics has had on methodological practices in stylistics. I illustrate some of these by introducing a software tool called Worldbuilder, developed by linguists and computer scientists at the University of Huddersfield to provide a means of improving the systematicity of cognitive stylistic analyses that utilise text World Theory (Werth, 1999). I suggest that the incorporation of basic principles from corpus linguistics such as data sampling and annotation are improving methodological and analytical practice in stylistics. Finally, having outlined the impact of corpus linguistics on stylistics, I consider what stylistics has to offer to corpus linguistics. I suggest that foregrounding theory, arguably the cornerstone of stylistics, offers valuable analytical insight when connected to notions of statistical salience.





语言学通讯

语言学 文学 翻译学

长按二维码关注我们吧



点击左下角阅读原文,链接号内搜,可以全面检索本公众号历史消息,不妨以“语料库” 为检索词试试吧!

    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存