MIR专题征稿 | Special Issue on Multi-Modal Representation Learning
Machine Intelligence Research
Machine Intelligence Research (MIR)专题"Multi-Modal Representation Learning"现公开征集原创稿件,截稿日期为2023年7月1日。欢迎赐稿!
多模态表征学习
▼
Special Issue onMulti-Modal Representation Learning
专题简介
The past decade has witnessed the impressive and steady development of single-modal (e.g., vision, language) AI technologies in several fields, thanks to the emergence of deep learning. Less studied, however, is multi-modal AI – commonly considered the next generation of AI – which utilizes complementary context concealed in different-modality inputs to improve performance.
One typical example of multi-modal AI is the contrastive language–image pre-training (CLIP) model, which has recently demonstrated strong generalization ability in learning visual concepts under the supervision of natural language. CLIP can be applied to a broad range of vision-language tasks, such as visual-language retrieval, human-machine interaction, and visual question answering. Similarly, other more recent approaches, such as BERT and its variants, also tend to focus on two modalities, e.g., vision and language.
By contrast, humans naturally learn from multi different modalities (i.e., sight, hearing, touch, smell, and taste), even when some are incomplete or missing, to form a global concept. Thus, in addition to the two popular modalities, other types of data such as depth, infrared information, events (captured by event cameras), audio, and user interaction, are also important for multi-modal learning in real-world scenes (e.g., contactless virtual social networks). Further, to address the inefficiencies that still exist in modal-modality representation learning, algorithms should, (1) consider human attention mechanisms, (2) address missing modalities, (3) guarantee the privacy of data from certain modalities, and (4) try to use a limited number of training samples, like humans.
Our goal for this special issue is to bring smart solutions together for robust representation learning in multi-modal scenes. We are interested in works related to theoretical, algorithmic, metric, and dataset advances, as well as new applications. This special issue will provide a timely collection of highly novel and original ideas for the broader communities, e.g., computer vision, image processing, natural language processing, pattern analysis and machine intelligence.
征稿范围(包括但不限于)
Topics of interest include, but are not limited to:
1) Theoretical aspects of robust multi-modal learning models;
2) Efficient multi-modal representation architectures, including human attention mechanisms for specific modalities;
3) Novel multi-modal representation models for image (RGB-D, RGB-T) and video domains;
4) Generative models for multi-modal networks;
5) Multi-modal representation models under point-cloud, 3D, 360°, and 4D scenes;
6) Multi-modal models under different levels of supervision, e.g., fully-/semi-/self-/unsupervised learning;
7) Uncertainty techniques for multi-modal learning;
8) Multi-modal learning combining visual data with audio, text, events, and tactile senses;
9) Novel metrics for multi-modal representation learning;
10) Large-scale datasets specific to multi-modal learning. Data should be publicly available without requiring access permission from the PI, and any related codes should be open source;
11) Multi-modal representation learning designs for low-level vision tasks, e.g., image restoration, saliency detection, edge detection, Interactive image segmentation, and medical image segmentation;
12) SLAM techniques for multi-modal learning;
13) Lightweight and general backbone designs for multi-modal representation;
14) Applications for AR/VR, automatic driving, robotics, and social good such as human interaction and ecosystems;
15) Federated learning models for multi-modal representation;
16) Innovative learning strategies that can exploit imperfect/incomplete/synthesized labels for multi-modal representation;
17) Out of Distribution models.
投稿指南
截稿日期:2023年7月1日
投稿地址(已开通):
https://mc03.manuscriptcentral.com/mir
投稿时,请在系统中选择:
“Step 6 Details & Comments: Special Issue and Special Section---Special Issue on Multi-Modal Representation Learning”.
客座编委
Deng-Ping Fan (*primary contact), Researcher, Computer Vision Lab, ETH Zurich, Switzerland, denfan@ethz.ch.
Nick Barnes, Professor (Former leader of CSIRO Computer Vision), Australian National University, Australia. nick.barnes@anu.edu.au.
Ming-Ming Cheng, Professor (TPAMI AE), Nankai University, China. cmm@nankai.edu.cn.
Luc Van Gool, Professor (Head of Toyota Lab TRACE), ETH Zurich, Switzerland. vangool@vision.ee.ethz.ch.
致谢 (Assistant Researchers)
We would like to thank the following scholars for their help (e.g., reviewers, organize, etc.) and suggestions in this special issue:
Zongwei Zhou, Postdoc, Johns Hopkins University, USA. zzhou82@jh.edu.
Mingchen Zhuge, Ph.D. Student, KAUST AI Initiative. mingchen.zhuge@kaust.edu.sa.
Ge-Peng Ji, Ph.D. Student, ANU. gepeng.ji@anu.edu.au.
∨
关于Machine Intelligence Research
Machine Intelligence Research(简称MIR,原刊名International Journal of Automation and Computing)由中国科学院自动化研究所主办,于2022年正式出版。MIR立足国内、面向全球,着眼于服务国家战略需求,刊发机器智能领域最新原创研究性论文、综述、评论等,全面报道国际机器智能领域的基础理论和前沿创新研究成果,促进国际学术交流与学科发展,服务国家人工智能科技进步。期刊入选"中国科技期刊卓越行动计划",已被ESCI、EI、Scopus、中国科技核心期刊、CSCD等数据库收录。
长时视觉跟踪器: 综述与实验评估 | 机器智能研究MIR
专题综述 | 高效的视觉识别: 最新进展及类脑方法综述北大黄铁军团队 | 专题综述:视觉信息的神经解码
专题综述 | 迈向脑启发计算机视觉的新范式
专题好文 | 新型类脑去噪内源生成模型: 解决复杂噪音下的手写数字识别问题
Top综述集锦 | 进化计算、知识挖掘、自然语言处理、人脸素描合成、机器人辅助手术...
戴琼海院士团队 | 用以图像去遮挡的基于事件增强的多模态融合混合网络ETH Zurich重磅综述 | 人脸-素描合成:一个新的挑战综述:从远程操作到自动机器人辅助显微手术华南理工詹志辉团队 | 综述: 面向昂贵优化的进化计算
北科大殷绪成团队 | 弱相关知识集成的小样本图像分类
东南大学张敏灵团队 | 基于选择性特征增广的多维分类方法联想CTO芮勇团队 | 知识挖掘:跨领域的综述中科院自动化所何晖光团队 | 一种基于RGEC的新型网络最新好文 | 基于因果推断的可解释对抗防御
复旦邱锡鹏团队 | 综述:自然语言处理中的范式转换
精选综述 | 用于白内障分级/分类的机器学习技术
致谢审稿人 | 机器智能研究MIR
2022研究前沿及热点解读 (附完整PDF) | 机器智能研究MIR双喜!MIR入选”2022中国科技核心期刊”,并被DBLP收录 | 机器智能研究MIR报喜!MIR入选2022年国际影响力TOP期刊榜单
喜报 | MIR被 ESCI 收录!喜报 | MIR 被 EI 与 Scopus 数据库收录
点击"阅读原文"进入Springer主页