Efficient Methods for Multimodal Models (EMM)
in conjunction with ICPR 2026

Overview

Welcome to our Efficient Methods for Multimodal Models (EMM) workshop.

This workshop will bring together researchers focused on developing efficient inferring, training, and fine-tuning methods, as well as model architectures and applications for multimodal models and tasks. With extensive applications in image and video understanding, as well as in image, video, and text generation, multimodal models have become increasingly prominent and transformative in the fields of pattern recognition and computer vision. Alongside the exponential growth of their parameters, there is an urgent need to investigate efficient learning and deployment methods for these models and tasks. This workshop will provide a unique platform for researchers and practitioners to collaborate on solutions, ultimately enhancing the efficiency and application of multimodal models

The topics will include efficient inferring methods and model architecture design for multimodal models, e.g., compression, quantization, distillation, and efficient architecture, as well as efficient learning methods, e.g., training and finetuning methods for multimodal tasks. Additionally, we will explore related applications, such as efficient multimodal generative and editing models, practical multimodal applications, and deploying multimodal models on lowpower devices. Relevant topics include:

Topics

We will cover all hand-related topics. The relevant topics include and not limited to:

Compression, quantization, conditional compute, pruning, and distillation of multi-modal models
Efficient sampling of multimodal diffusion models, e.g., step distillation and consistency models
Efficient training/finetuning of multi-modal models, e.g., low-rank adaptation
Efficient LLM/LVLM/MLLM in multi-modal tasks, e.g., token pruning and merging
Imitation learning, reinforcement learning
Efficient multi-/crossmodal learning
Efficient multimodal applications (e.g., drone vision, autonomous driving, etc.)
Efficient multimodal generative and editing models and sensors, e.g., for vision, language, audio and 3D objects
Efficient self-/un-/weakly-supervised learning for multimodal data
Efficient image, video and audio synthesis by multimodal data
Deploying multimodal models on low power devices e.g., smartphone

Instructions for Authors

Important Dates

Important dates

Paper Submission : April 1 - May 1, 2026

Notifications to Authors : May 21, 2026

Camera Ready Papers Submission : June 1 - June 20, 2026

Workshop Event : August 21, 2026

Submission Guidelines

Authors are invited to submit original contributions that have not been previously published and are not under review elsewhere. All manuscripts must be written in English and prepared using the official conference templates. Submissions will be evaluated through a single-blind peer-review process. More details on the submission process can be found on the official ICPR 2026 conference website: https://icpr2026.org/instructions.html.

Page Limits

Maximum paper length: 15 pages, including references, figures and tables.

Submission System

Papers must be submitted through the Microsoft CMT submission system, which is used to manage the peer-review process. The Microsoft CMT service is provided free of charge by Microsoft, including infrastructure and technical support.

Registration Requirement

For each accepted paper, at least one author must register for the conference and present the work. Registration detail can be found on the official ICPR 2026 conference website.

Acknowledgement

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

Organizers

TBD

Contact

Contact: Yanjing Li (Email: yanjingli@buaa.edu.cn)