查看原文
其他

星跃计划 | MSR Asia-MSR Redmond 联合科研项目人才招募中

微软学术合作 微软学术合作 2021-04-28


微软亚洲研究院、微软雷德蒙研究院联合推出“星跃计划”


该计划旨在为优秀人才创造与微软全球两大研究院的研究团队一起聚焦真实前沿问题的机会。你将在国际化的科研环境中、在多元包容的科研氛围中、在顶尖研究员的指导下,做有影响力的研究!


首批推出的跨研究院联合科研项目覆盖自然语言处理、数据智能、计算机系统与网络、智能云等领域。研究项目如下:High-performance Distributed Deep Learning, Intelligent Data Cleansing, Intelligent Power-Aware Virtual  Machine  Allocation, Neuro-Symbolic Semantic Parsing for Data Science, Next-Gen Large Pretrained Language Models。


“星跃计划”助你跨越重洋,探索科研的更多可能!


星跃亮点


  • 同时在微软亚洲研究院、微软雷德蒙研究院顶级研究员的指导下进行科研工作,与不同研究背景的科研人员深度交流

  • 聚焦来自于工业界的真实前沿问题,致力于做出对学术及产业界有影响力的成果

  • 通过线下与线上的交流合作,在微软的两大研究院了解国际化、开放的科研氛,及多元与包容的文化

申请资格 


  • 本科、硕士、博士在读学生;延期(deferred)或间隔年(gap year)学生

  • 可全职在国内工作6-12个月

  • 各项目详细要求详见下方项目介绍


还在等什么?

快来寻找适合你的项目吧!


High-performance 

Distributed Deep Learning



点击此处向上滑动阅览

The parallel and distributed systems are the solution to address the ever-increasing complexity problem of deep learning trainings. However, existing solutions still leave efficiency and scalability on the table by missing optimization opportunities on various environments at industrial scale.

 

In this project, we’ll work with scientists who are at the forefront of system and network research, leveraging the world-leading platforms to solve system and networking problems in parallel and distributed deep learning area. The current project team members, from both MSR Asia and MSR Redmond labs, have rich experience contributing to both industry and academic community through transferring innovations that support production systems and publications at top conferences.


Research Areas 


System and Networking, MSR Asia

https://www.microsoft.com/en-us/research/group/systems-and-networking-research-group-asia/


Research in Software Engineering, MSR Redmond

https://www.microsoft.com/en-us/research/group/research-software-engineering-rise/

Qualifications


  • Major in computer science, electrical engineering, or equivalent field 

  • Solid knowledge of data structure/algorithm 

  • Familiarity with Python, C/C++ and other programming languages, familiar with Linux and development on Linux platform 

  • Good communication and presentation skills 

  • Good English reading and writing ability, capable of system implementing based on academic papers in English, capable of writing English documents 


Those with the following conditions are preferred: 


  • Familiarity with deep learning systems, e.g., PyTorch TensorFlow, GPU programming and networking 

  • Familiarity with NCCL, MPI communication protocols such as OpenMPI and MVAPICH 

  • Rich knowledge of machine learning and machine learning models 

  • Familiarity with engineering process as a strong plus 

  • Active on GitHub, used or participated in well-known open source projects 


Intelligent Data Cleansing 



点击此处向上滑动阅览

Tabular data such as Excel spreadsheets and databases are one of the most important assets in large enterprises today, which  however are often plagued with data quality issues. Intelligent data cleansing  focuses on novel ways to  detect and fix data quality issues in tabular data, which can assist the large class of  less-technical  and non-technical users in enterprises. 


We are interested in a variety of topics in this area,  including  data-driven and intelligent techniques  to  detect data quality issues and suggest possible fixes, leveraging inferred constraints and statistical properties based on existing data assets and software artifacts.


Research Areas 


Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/


Exploration and Mining (DMX), MSR Redmond

https://www.microsoft.com/en-us/research/group/data-management-exploration-and-mining-dmx


Qualifications


  • Graduate-level students in Computer Science or related  STEM  fields.  PhD  students  are  preferred 

  • Students with research  background  in  database, data mining,  statistics, software engineering,  and  visualization  are preferred


Intelligent Power-Aware

 Virtual  Machine  Allocation  



点击此处向上滑动阅览

As one of the world-leading cloud service providers, Microsoft Azure manages tens of millions of virtual machines every day. Within such a large-scale cloud system, how to efficiently allocate virtual machines on servers is critical and has been a hot research topic for years. Previously, teams from MSR-Asia and MSR-Redmond have made significant contributions in this area that resulted in production impact and publication of academic papers at top-tier conferences (e.g., IJCAI, AAAI, OSDI, NSDI). In this project we intend to unify the strength of MSR-Asia and MSR-Redmond for performing forward-looking and collaborative research on power management in datacenters, including power-aware virtual machine allocation. The project involves developing power prediction models by leveraging the start-of-the-art machine learning methods, as well as building efficient and reliable allocation systems in large-scale distributed environments.


Research Areas


Data, Knowledge, and Intelligence (DKI), MSR Asia

https://www.microsoft.com/en-us/research/group/data-knowledge-intelligence/


System, MSR Redmond

https://www.microsoft.com/en-us/research/group/systems-research-group-redmond/


Qualifications


  • Currently enrolled in a graduate program in computer science or equivalent field 

  • Good research track record in related areas 

  • Able  to carry out research tasks  with  high  quality  

  • Good communication and presentation skills in written and oral English  

  • Knowledge and experience in machine learning, data mining and data analytics are preferred 

  • Familiarity with AIOps or AI for systems is a strong plus 


Neuro-Symbolic Semantic Parsing

 for Data Science 



点击此处向上滑动阅览

Our cross-lab, inter-disciplinary research team develops AI technology for interactive coding assistance for data science, data analytics, and business process automation. It allows the user to specify their data processing intent in the middle of their workflow using a combination of natural language, input-output examples, and multi-modal UX – and translates that intent into the desired source code. The underlying AI technology integrates our state-of-the-art research in program synthesis, semantic parsing, and structure-grounded natural language understanding. It has the potential to improve productivity of millions of data scientists and software developers, as well as establish new scientific milestones for deep learning over structured data, grounded language understanding, and neuro-symbolic AI. 

 

The research project involves collecting and establishing a novel benchmark dataset for data science program generation, developing novel neuro-symbolic semantic parsing models to tackle this challenge, adapting large-scale pretrained language models to new domains and knowledge bases, as well as publishing in top-tier AI/NLP conferences. We expect the benchmark dataset and the new models to be used in academia as well as in Microsoft products. 


Research Areas


Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing


Neuro-Symbolic Learning, MSR Redmond


Qualifications


  • Masters or Ph.D. students, majoring in computer science or equivalent areas 

  • Background in deep NLP, semantic parsing, sequence-to-sequence learning, Transformers required

  • Experience  with  PyTorch  and  HuggingFace Transformers 

  • Fluent English speaking, listening, and writing skills 

  • Background in deep learning over structured data (graphs/trees/programs) and program synthesis preferred 

  • Students with papers published at top-tier AI/NLP conferences are preferred 


Next-Gen Large 

Pretrained Language Models 



点击此处向上滑动阅览

The goal of this project is to develop game-changing techniques for next-gen large pre-trained language models, including 

(1) Beyond UniLM/InfoXLM: novel pre-training frameworks and self-supervised tasks for monolingual and multilingual pre-training to support language understanding, generation and translation tasks;


 (2) Beyond Transformers: new model architectures and optimization algorithms for improving training effectiveness and efficiency of extremely large language models; 


(3) Knowledge Fusion: new modeling frameworks to fuse massive pre-compiled knowledge into pre-trained models; 


(4) Lifelong Self-supervised Learning: mechanisms and algorithms for lifelong (incremental) pre-training. This project extends our existing research and aims to advance SOTA on NLP and AI in general. 


Research Areas


Natural Language Computing, MSR Asia

https://www.microsoft.com/en-us/research/group/natural-language-computing


Deep Learning, MSR Redmond

https://www.microsoft.com/en-us/research/group/deep-learning-group


Qualifications


  • Major in computer science or equivalent  areas 

  • One+ year research experience in deep learning for NLP, CV or related areas 

  • Experience with open-source tools such as  PyTorch, Tensorflow, etc. 

  • Background knowledge of language model pre-training is preferred 

  • Track record of publications in related top conferences (e.g., ACL, EMNLP, NAACL, ICML, NeurIPS, ICLR) is preferred 

  • Excellent communication and writing skills 


申请方式

符合条件的申请者请填写下方申请表:

https://jinshuju.net/f/VHUgv6

或扫描下方二维码,立即填写进入申请!


点击“阅读原文”,立刻填写报名信息!

    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存