星跃计划 | MSR Asia-MSR Redmond 联合科研项目人才招募中 | 自由微信

星跃计划 | MSR Asia-MSR Redmond 联合科研项目人才招募中

Original 微软学术合作微软学术合作 2021-04-28

微软亚洲研究院、微软雷德蒙研究院联合推出“星跃计划”！

该计划旨在为优秀人才创造与微软全球两大研究院的研究团队一起聚焦真实前沿问题的机会。你将在国际化的科研环境中、在多元包容的科研氛围中、在顶尖研究员的指导下，做有影响力的研究！

首批推出的跨研究院联合科研项目覆盖自然语言处理、数据智能、计算机系统与网络、智能云等领域。研究项目如下：High-performance Distributed Deep Learning, Intelligent Data Cleansing, Intelligent Power-Aware Virtual  Machine  Allocation, Neuro-Symbolic Semantic Parsing for Data Science, Next-Gen Large Pretrained Language Models。

“星跃计划”助你跨越重洋，探索科研的更多可能！

星跃亮点

同时在微软亚洲研究院、微软雷德蒙研究院顶级研究员的指导下进行科研工作，与不同研究背景的科研人员深度交流
聚焦来自于工业界的真实前沿问题，致力于做出对学术及产业界有影响力的成果
通过线下与线上的交流合作，在微软的两大研究院了解国际化、开放的科研氛围，及多元与包容的文化

申请资格

本科、硕士、博士在读学生；延期（deferred）或间隔年（gap year）学生
可全职在国内工作6-12个月
各项目详细要求详见下方项目介绍

▼

还在等什么？

快来寻找适合你的项目吧！

High-performance

Distributed Deep Learning

点击此处向上滑动阅览

The parallel and distributed systems are the solution to address the ever-increasing complexity problem of deep learning trainings. However, existing solutions still leave efficiency and scalability on the table by missing optimization opportunities on various environments at industrial scale.

In this project, we’ll work with scientists who are at the forefront of system and network research, leveraging the world-leading platforms to solve system and networking problems in parallel and distributed deep learning area. The current project team members, from both MSR Asia and MSR Redmond labs, have rich experience contributing to both industry and academic community through transferring innovations that support production systems and publications at top conferences.

Research Areas

System and Networking, MSR Asia

https://www.microsoft.com/en-us/research/group/systems-and-networking-research-group-asia/

Research in Software Engineering, MSR Redmond

https://www.microsoft.com/en-us/research/group/research-software-engineering-rise/

Qualifications

Major in computer science, electrical engineering, or equivalent field
Solid knowledge of data structure/algorithm
Familiarity with Python, C/C++ and other programming languages, familiar with Linux and development on Linux platform
Good communication and presentation skills
Good English reading and writing ability, capable of system implementing based on academic papers in English, capable of writing English documents

Those with the following conditions are preferred:

Familiarity with deep learning systems, e.g., PyTorch TensorFlow, GPU programming and networking
Familiarity with NCCL, MPI communication protocols such as OpenMPI and MVAPICH
Rich knowledge of machine learning and machine learning models
Familiarity with engineering process as a strong plus
Active on GitHub, used or participated in well-known open source projects

Intelligent Data Cleansing 

点击此处向上滑动阅览

Tabular data such as Excel spreadsheets and databases are one of the most important assets in large enterprises today, which  however are often plagued with data quality issues. Intelligent data cleansing  focuses on novel ways to  detect and fix data quality issues in tabular data, which can assist the large class of  less-technical  and non-technical users in enterprises.

We are interested in a variety of topics in this area,  including  data-driven and intelligent techniques  to detect data quality issues and suggest possible fixes, leveraging inferred constraints and statistical properties based on existing data assets and software artifacts.

Research Areas

Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/

Exploration and Mining (DMX), MSR Redmond

https://www.microsoft.com/en-us/research/group/data-management-exploration-and-mining-dmx

Qualifications

Graduate-level students in Computer Science or related  STEM  fields.  PhD  students  are  preferred
Students with research  background  in  database, data mining,  statistics, software engineering,  and  visualization  are preferred

Intelligent Power-Aware

Virtual  Machine  Allocation 

点击此处向上滑动阅览

As one of the world-leading cloud service providers, Microsoft Azure manages tens of millions of virtual machines every day. Within such a large-scale cloud system, how to efficiently allocate virtual machines on servers is critical and has been a hot research topic for years. Previously, teams from MSR-Asia and MSR-Redmond have made significant contributions in this area that resulted in production impact and publication of academic papers at top-tier conferences (e.g., IJCAI, AAAI, OSDI, NSDI). In this project we intend to unify the strength of MSR-Asia and MSR-Redmond for performing forward-looking and collaborative research on power management in datacenters, including power-aware virtual machine allocation. The project involves developing power prediction models by leveraging the start-of-the-art machine learning methods, as well as building efficient and reliable allocation systems in large-scale distributed environments.

Research Areas

Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/

System, MSR Redmond

https://www.microsoft.com/en-us/research/group/systems-research-group-redmond/

Qualifications

Currently enrolled in a graduate program in computer science or equivalent field
Good research track record in related areas
Able  to carry out research tasks  with  high  quality 
Good communication and presentation skills in written and oral English 
Knowledge and experience in machine learning, data mining and data analytics are preferred
Familiarity with AIOps or AI for systems is a strong plus

Neuro-Symbolic Semantic Parsing

for Data Science

点击此处向上滑动阅览

Our cross-lab, inter-disciplinary research team develops AI technology for interactive coding assistance for data science, data analytics, and business process automation. It allows the user to specify their data processing intent in the middle of their workflow using a combination of natural language, input-output examples, and multi-modal UX – and translates that intent into the desired source code. The underlying AI technology integrates our state-of-the-art research in program synthesis, semantic parsing, and structure-grounded natural language understanding. It has the potential to improve productivity of millions of data scientists and software developers, as well as establish new scientific milestones for deep learning over structured data, grounded language understanding, and neuro-symbolic AI.

The research project involves collecting and establishing a novel benchmark dataset for data science program generation, developing novel neuro-symbolic semantic parsing models to tackle this challenge, adapting large-scale pretrained language models to new domains and knowledge bases, as well as publishing in top-tier AI/NLP conferences. We expect the benchmark dataset and the new models to be used in academia as well as in Microsoft products.

Research Areas

Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing

Neuro-Symbolic Learning, MSR Redmond

Qualifications

Masters or Ph.D. students, majoring in computer science or equivalent areas
Background in deep NLP, semantic parsing, sequence-to-sequence learning, Transformers required
Experience with PyTorch and HuggingFace Transformers
Fluent English speaking, listening, and writing skills
Background in deep learning over structured data (graphs/trees/programs) and program synthesis preferred
Students with papers published at top-tier AI/NLP conferences are preferred

Next-Gen Large

Pretrained Language Models

点击此处向上滑动阅览

The goal of this project is to develop game-changing techniques for next-gen large pre-trained language models, including

(1) Beyond UniLM/InfoXLM: novel pre-training frameworks and self-supervised tasks for monolingual and multilingual pre-training to support language understanding, generation and translation tasks;

(2) Beyond Transformers: new model architectures and optimization algorithms for improving training effectiveness and efficiency of extremely large language models;

(3) Knowledge Fusion: new modeling frameworks to fuse massive pre-compiled knowledge into pre-trained models;

(4) Lifelong Self-supervised Learning: mechanisms and algorithms for lifelong (incremental) pre-training. This project extends our existing research and aims to advance SOTA on NLP and AI in general.

Research Areas

Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing

Deep Learning, MSR Redmond

https://www.microsoft.com/en-us/research/group/deep-learning-group

Qualifications

Major in computer science or equivalent areas
One+ year research experience in deep learning for NLP, CV or related areas
Experience with open-source tools such as  PyTorch, Tensorflow, etc.
Background knowledge of language model pre-training is preferred
Track record of publications in related top conferences (e.g., ACL, EMNLP, NAACL, ICML, NeurIPS, ICLR) is preferred
Excellent communication and writing skills

申请方式

符合条件的申请者请填写下方申请表：

https://jinshuju.net/f/VHUgv6

或扫描下方二维码，立即填写进入申请！

点击“阅读原文”，立刻填写报名信息！

逃出缅甸红莲宾馆

紧急呼救！上海报恩寺普渡众生身陷困境，恳请大家伸出援手共渡难关！

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

13岁！史上最严重霸凌案宣判，如何亡羊补牢？