EMM

Efficient Methods for Multimodal Models (EMM)
in conjunction with ICPR 2026


Overview

Welcome to our Efficient Methods for Multimodal Models (EMM) workshop.

This workshop will bring together researchers focused on developing efficient inferring, training, and fine-tuning methods, as well as model architectures and applications for multimodal models and tasks. With extensive applications in image and video understanding, as well as in image, video, and text generation, multimodal models have become increasingly prominent and transformative in the fields of pattern recognition and computer vision. Alongside the exponential growth of their parameters, there is an urgent need to investigate efficient learning and deployment methods for these models and tasks. This workshop will provide a unique platform for researchers and practitioners to collaborate on solutions, ultimately enhancing the efficiency and application of multimodal models

The topics will include efficient inferring methods and model architecture design for multimodal models, e.g., compression, quantization, distillation, and efficient architecture, as well as efficient learning methods, e.g., training and finetuning methods for multimodal tasks. Additionally, we will explore related applications, such as efficient multimodal generative and editing models, practical multimodal applications, and deploying multimodal models on lowpower devices. Relevant topics include:

Topics

We will cover all hand-related topics. The relevant topics include and not limited to:
  • Compression, quantization, conditional compute, pruning, and distillation of multi-modal models
  • Efficient sampling of multimodal diffusion models, e.g., step distillation and consistency models
  • Efficient training/finetuning of multi-modal models, e.g., low-rank adaptation
  • Efficient LLM/LVLM/MLLM in multi-modal tasks, e.g., token pruning and merging
  • Imitation learning, reinforcement learning
  • Efficient multi-/crossmodal learning
  • Efficient multimodal applications (e.g., drone vision, autonomous driving, etc.)
  • Efficient multimodal generative and editing models and sensors, e.g., for vision, language, audio and 3D objects
  • Efficient self-/un-/weakly-supervised learning for multimodal data
  • Efficient image, video and audio synthesis by multimodal data
  • Deploying multimodal models on low power devices e.g., smartphone

Instructions for Authors

Submission Guidelines

Authors are invited to submit original contributions that have not been previously published and are not under review elsewhere. All manuscripts must be written in English and prepared using the official conference templates. Submissions will be evaluated through a single-blind peer-review process. More details on the submission process can be found on the official ICPR 2026 conference website: https://icpr2026.org/instructions.html.

Page Limits

Maximum paper length: 15 pages, including references, figures and tables.

Submission System

Papers must be submitted through the Microsoft CMT submission system, which is used to manage the peer-review process. The Microsoft CMT service is provided free of charge by Microsoft, including infrastructure and technical support.

Registration Requirement

For each accepted paper, at least one author must register for the conference and present the work. Registration detail can be found on the official ICPR 2026 conference website.

Acknowledgement

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

Organizers

TBD

Contact

Contact: Yanjing Li (Email: yanjingli@buaa.edu.cn)