交换机|大英图书馆招募数字人文项目研究员
该项目由大英图书馆主持
申请截止日期:2021年11月2日
*中文介绍仅翻译部分内容,完整招募信息请参阅英语原文或点击文末“阅读原文”访问原网页。
大英图书馆数字研究团队招募新成员。研究员将采用新的数字工具和技术,探索可能的解决方案,实现中国古代写本的自动转录。该职位将重点关注来自敦煌(中国)的资料,这是斯泰因收藏(Stein collection)的一部分。作为大英图书馆开展的数字化活动的一部分,妙法莲华经手稿数字化项目(Lotus Sutra Manuscripts Digitisation Project)正在对该收藏进行保存和数字化,以使其保管的藏书可供所有人访问。数字化内容将可通过国际敦煌项目(IDP)平台访问。
该职位提供了一个独特的专业发展机会,特别适合处于职业生涯早期或中期、对文化遗产和数字人文感兴趣、有了解的候选人。大英图书馆热切致力于与成功申请者的院校机构建立长期合作关系,并乐于积极促成和鼓励与英国数字人文网络对话的机会。有关完整的候选人要求和资格标准,请参见下文。
主要职责
KEY RESPONSIBILITIES
深入了解作为妙法莲华经手稿数字化项目一部分的数字化内容
研究IDP网站上现有的数字化资料,确定文本识别工具的不同脚本和挑战
确定关键权益关系者,为中国光学字符识别(OCR)和手写文本识别(HTR)研究现有市场解决方案、工具和方法
使用IDP材料培训文本识别系统,评估和比较结果
与大英图书馆的相关同事合作,提高对大英图书馆的斯泰因收藏和其他中亚收藏以及IDP平台上可用的其他数字化内容的认识,并提升其以机器可读格式(如文本挖掘和数据可视化)进行研究的潜力
发展图书馆与中国OCR/HTR系统合作的全球网络,并促进与研究员所在机构和中国数字人文研究社群的关系,以期为未来的协作奠定基础
交付成果
DELIVERABLES
一个建议图书馆使用的平台、软件或工具,用于处理IDP平台上可获取的数字化资料
一份关于OCR/HTR工具在IDP网站上提供的数字化收藏项目中可能面临的文本、脚本类型和潜在挑战的报告,包括测试系统和结果的概述
一套建议的操作工作流程,用于制作、校对、更正抄本并将其输入图书馆策略系统
在内部和外部宣传该项目,包括在大英图书馆的数字研究、亚洲和非洲收藏和IDP博客上发表文章,使用大英图书馆的其他社交媒体平台,并向图书馆工作人员介绍该项目、其目标和成果
参与图书馆的工作坊/会议,完成妙法莲华经手稿数字化项目
成为学者和专业人士网络的积极成员,探索中国历史文献的OCR/HTR解决方案,培养长期的工作关系
在英国、中国和全球数字人文网络内交流经验,为未来的区域间合作奠定基础
参与数字研究团队的其他相关活动
候选人要求
CANDIDATE REQUIREMENTS
具有相关学科学位,如:数字人文、计算机科学和/或文化史
掌握中文,最好能够阅读/识别中国古代写本和书法风格的多种变体
优秀的英语听说读写能力
熟悉OCR/HTR系统
对数字人文研究用到的工具和方法有明确的认识,例如文本和数据挖掘、命名实体识别、数据建模和链接、数据可视化等
对档案材料、图书馆藏品和数字化感兴趣
优秀的写作能力及人际网络和合作关系建设经验
个人在提出申请时,必须居住在其本国。
注:此职位仅对来自中国大陆的申请人开放。
Automating the recognition of historical Chinese handwritten texts
Hosted by the British Library
Open for applications until 2 November 2021
This fellowship sits within the British Library’s Digital Research Team. It will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of historical Chinese handwritten texts. The fellowship will focus on material from Dunhuang (China), part of the Stein collection, which is being conserved and digitised through the Lotus Sutra Manuscripts Digitisation Project as part of the digitisation activities conducted by the British Library to make the collections under its custodianship accessible to all. The digitised content will be accessible through the International Dunhuang Project (IDP) platform.
This fellowship offers a unique professional development opportunity which would be particularly suited to candidates at the early or middle stages of their careers and with interests in and knowledge of both cultural heritage and digital humanities. The British Library is keen to explore opportunities for long-term partnership with the successful applicant’s home institution and will be happy to actively enable and encourage opportunities for dialogue with digital humanities networks in the UK. See below for full candidate requirements and eligibility criteria.
The Stein Collection
The British Library’s Stein collection, gathered by Aurel Stein in the early 20th century, is one of the most outstanding collections of manuscripts and printed books from China and Central Asia. It is of immense historical and cultural significance, containing over 45,000 items written on paper, wood and other materials in many languages, such as Chinese, Tibetan, Sanskrit, Tangut, Khotanese, Kuchean, Sogdian, Uighur, Turkic and Mongolian. It notably holds some of the most important surviving Buddhist texts, such as the famous printed copy of the Diamond Sutra from the Dunhuang Library Cave dated to 868 AD.
The International Dunhuang Project
Established by the British Library in 1994, the International Dunhuang Project is an international collaborative programme including institutions from Europe, Asia and the US holding collections related to Dunhuang and other Silk Road sites. All partners aim to conserve, catalogue and digitise manuscripts, printed texts, paintings, textiles and artefacts under their custodianship and make them freely available online on a web platform. The National Library of China and the Dunhuang Academy, in China, are amongst the project’s key contributors. As part of this effort, and thanks to the generous support of a number of institutions and foundations, a large number of manuscripts from the Stein collection have been digitised and images have been made available on the IDP website (over 170,000 to date).
Project scope and objectives
Building upon this vast and well-curated digitised resource, the Library’s Digital Research Team aims to promote the collection, enhance its searchability, and actively engage with innovative research using its data, through methods such as text mining and data visualisations. As part of this work, members of the Digital Research Team are engaging closely with the development of Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) systems for non-Western scripts.
The Chevening Fellow will contribute to these efforts. They will research the current landscape of Chinese handwritten text recognition – looking into methods, challenges, tools and software. They will test our material with existing tools and demonstrate digital research opportunities arising from the availability of texts in machine-readable format.
The Library’s ongoing Lotus Sutra Manuscripts Digitisation Project aims to conserve, catalogue and digitise nearly 800 Lotus Sutra manuscripts from Dunhuang in the Chinese language. This corpus of texts constitutes an ideal test case: not only because the Lotus Sutra is one of the main Buddhist scriptures and the canonical edition has already been transcribed, but also because the manuscripts present minor variations, such as variant characters, handwriting and scribal errors. The fellow could therefore use the project’s digitised content as a starting point to examine approaches, opportunities and possible solutions to automate the transcription of our Chinese historical collections.
Key Responsibilities
To develop an in-depth understanding of the content digitised as part of the Lotus Sutra Manuscripts Digitisation Project
To research existing digitised materials available on the IDP website and identify different scripts and challenges for text recognition tools
To identify key stakeholders and research existing market solutions, tools and methods for Chinese OCR/HTR
To train text recognition systems with IDP materials, evaluate and compare results
In collaboration with the relevant British Library colleagues, to increase awareness of the Stein and other Central Asian collections at the British Library and other digitised content available on the IDP platform, and to promote their research potential when in machine-readable format, e.g. text mining and data visualisation
To develop the Library’s engagement in a global network working with Chinese OCR/HTR systems and foster relationships with the Fellow’s home institution and Chinese Digital Humanities research communities, with the view of forming the basis for future partnerships and collaborations
Deliverables
A recommended platform, software or tool for the Library to work with using digitised materials available on the IDP platform
A report on the types of texts, scripts and potential challenges that OCR/HTR tools may face with digitised collection items available on the IDP website, including an overview of tested systems and outcomes
A suggested operational workflow to produce, proof read, correct and feed transcriptions into Library strategic systems
Promoting the project internally and externally, including posts on the British Library’s Digital Scholarship, Asian and African Collections and IDP blogs, using other British Library social media platforms, and giving a talk for Library staff members about the project, its aims and outcomes
Contributing to the Library’s workshop/conference concluding the Lotus Sutra Manuscripts Digitisation Project
Becoming an active member of a network of scholars and professionals exploring OCR/HTR solutions for historical Chinese documents, fostering longer-term working relationships
Exchanging experiences and lessons learnt within UK, Chinese and global DH networks laying the foundations for future inter-regional collaborations
Participating in other related activities of the Digital Research Team
Candidate requirements
Degree in a relevant subject e.g. digital humanities, computer science and/or cultural history
Knowledge of Chinese language, ideally with the ability to read/recognise several variants of historical Chinese scripts and calligraphic styles
Excellent written and spoken English
Familiarity with OCR/HTR systems
Demonstrable knowledge of tools and methods useful for digital humanities research e.g. text and data mining, Named Entity Recognition, data modelling and linking, data visualisation, etc.
Interest in archival material, library collections and digitisation
Excellent writing skills and experience of networking and partnership building
Individuals must be resident in their home country at the time of making their application.
Note: This fellowship is only open to applicants from China.
Development Opportunities
Staff-level access to unique British Library collections and research resources, including access to staff training opportunities
Staff-level access to the Digital Scholarship Training Programme courses, workshops, talks and reading group
Opportunity to network and exchange ideas with digital scholarship staff, East Asia section curators and the wider Asian & African Collections department and other colleagues across the Library, as well as externally within the UK and wider professional DH communities
Opportunity to gain experience in disseminating project outcomes and engaging different audiences through various communication channels
Opportunity to become familiar with the activities of the International Dunhuang Project and the work of the Endangered Archives Programme, which has helped digitise manuscripts and archival material in and around China
Opportunity to enhance spoken and written English through work practice and collaboration with colleagues
主编 / 徐力恒
责编 / 傅春妍
美编 / 傅春妍