MIR专题征稿 | Special Issue on Multi-Modal Representation Learning

Original MIR编辑部机器智能研究MIR 2023-06-07

Machine Intelligence Research

Machine Intelligence Research (MIR)专题"Multi-Modal Representation Learning"现公开征集原创稿件，截稿日期为2023年7月1日。欢迎赐稿！

▼专题

多模态表征学习

▼

Special Issue on

Multi-Modal Representation Learning

专题简介

The past decade has witnessed the impressive and steady development of single-modal (e.g., vision, language) AI technologies in several fields, thanks to the emergence of deep learning. Less studied, however, is multi-modal AI – commonly considered the next generation of AI – which utilizes complementary context concealed in different-modality inputs to improve performance.

One typical example of multi-modal AI is the contrastive language–image pre-training (CLIP) model, which has recently demonstrated strong generalization ability in learning visual concepts under the supervision of natural language. CLIP can be applied to a broad range of vision-language tasks, such as visual-language retrieval, human-machine interaction, and visual question answering. Similarly, other more recent approaches, such as BERT and its variants, also tend to focus on two modalities, e.g., vision and language.

By contrast, humans naturally learn from multi different modalities (i.e., sight, hearing, touch, smell, and taste), even when some are incomplete or missing, to form a global concept. Thus, in addition to the two popular modalities, other types of data such as depth, infrared information, events (captured by event cameras), audio, and user interaction, are also important for multi-modal learning in real-world scenes (e.g., contactless virtual social networks). Further, to address the inefficiencies that still exist in modal-modality representation learning, algorithms should, (1) consider human attention mechanisms, (2) address missing modalities, (3) guarantee the privacy of data from certain modalities, and (4) try to use a limited number of training samples, like humans.

Our goal for this special issue is to bring smart solutions together for robust representation learning in multi-modal scenes. We are interested in works related to theoretical, algorithmic, metric, and dataset advances, as well as new applications. This special issue will provide a timely collection of highly novel and original ideas for the broader communities, e.g., computer vision, image processing, natural language processing, pattern analysis and machine intelligence.

征稿范围(包括但不限于)

Topics of interest include, but are not limited to:

1) Theoretical aspects of robust multi-modal learning models;

2) Efficient multi-modal representation architectures, including human attention mechanisms for specific modalities;

3) Novel multi-modal representation models for image (RGB-D, RGB-T) and video domains;

4) Generative models for multi-modal networks;

5) Multi-modal representation models under point-cloud, 3D, 360°, and 4D scenes;

6) Multi-modal models under different levels of supervision, e.g., fully-/semi-/self-/unsupervised learning;

7) Uncertainty techniques for multi-modal learning;

8) Multi-modal learning combining visual data with audio, text, events, and tactile senses;

9) Novel metrics for multi-modal representation learning;

10) Large-scale datasets specific to multi-modal learning. Data should be publicly available without requiring access permission from the PI, and any related codes should be open source;

11) Multi-modal representation learning designs for low-level vision tasks, e.g., image restoration, saliency detection, edge detection, Interactive image segmentation, and medical image segmentation;

12) SLAM techniques for multi-modal learning;

13) Lightweight and general backbone designs for multi-modal representation;

14) Applications for AR/VR, automatic driving, robotics, and social good such as human interaction and ecosystems;

15) Federated learning models for multi-modal representation;

16) Innovative learning strategies that can exploit imperfect/incomplete/synthesized labels for multi-modal representation;

17) Out of Distribution models.

投稿指南

截稿日期：2023年7月1日

投稿地址（已开通）：

https://mc03.manuscriptcentral.com/mir

投稿时，请在系统中选择：

“Step 6 Details & Comments: Special Issue and Special Section---Special Issue on Multi-Modal Representation Learning”.

客座编委

Deng-Ping Fan (*primary contact), Researcher, Computer Vision Lab, ETH Zurich, Switzerland, denfan@ethz.ch.

Nick Barnes, Professor (Former leader of CSIRO Computer Vision), Australian National University, Australia. nick.barnes@anu.edu.au.

Ming-Ming Cheng, Professor (TPAMI AE), Nankai University, China. cmm@nankai.edu.cn.

Luc Van Gool, Professor (Head of Toyota Lab TRACE), ETH Zurich, Switzerland. vangool@vision.ee.ethz.ch.

致谢 (Assistant Researchers)

We would like to thank the following scholars for their help (e.g., reviewers, organize, etc.) and suggestions in this special issue:

Zongwei Zhou, Postdoc, Johns Hopkins University, USA. zzhou82@jh.edu.

Mingchen Zhuge, Ph.D. Student, KAUST AI Initiative. mingchen.zhuge@kaust.edu.sa.

Ge-Peng Ji, Ph.D. Student, ANU. gepeng.ji@anu.edu.au.

∨

关于Machine Intelligence Research

Machine Intelligence Research（简称MIR，原刊名International Journal of Automation and Computing）由中国科学院自动化研究所主办，于2022年正式出版。MIR立足国内、面向全球，着眼于服务国家战略需求，刊发机器智能领域最新原创研究性论文、综述、评论等，全面报道国际机器智能领域的基础理论和前沿创新研究成果，促进国际学术交流与学科发展，服务国家人工智能科技进步。期刊入选"中国科技期刊卓越行动计划"，已被ESCI、EI、Scopus、中国科技核心期刊、CSCD等数据库收录。

点击"阅读原文"进入Springer主页

宾曰语云被法学教授投诉：严重侵权，“违法犯罪”！

京东Plus的隐藏特权，很多会员都没领取，白交了会员费...

呼吁四川大学澄清：1998年1月，川大有多少个“姜涛与爱人程月玲”？

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

96岁的朱总理

MIR专题征稿 | Special Issue on Multi-Modal Representation Learning

您可能也对以下帖子感兴趣

宾曰语云被法学教授投诉：严重侵权，“违法犯罪”！

京东Plus的隐藏特权，很多会员都没领取，白交了会员费...

呼吁四川大学澄清：1998年1月，川大有多少个“姜涛与爱人程月玲”？

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

96岁的朱总理

生成图片，分享到微信朋友圈

MIR专题征稿 | Special Issue on Multi-Modal Representation Learning

您可能也对以下帖子感兴趣