MISP-Meeting Corpus
Introduction
The MISP-Meeting corpus focuses on the meeting scenario, where 4–8 meeting attendees sit around an 8-microphone array and a panoramic camera, both placed adjacent to each other on the table in a standard meeting room, engaging in a natural conversation. Additionally, each participant wore a headset microphone synchronized with a Zoom F8N recorder to share a common clock. This novel recording setup yields a wealth of audio-visual data, including near-field mono speech for each speaker (*-F8N.zip), far-field 8-channel speech (*-CSOBx3.zip), and 360-degree panoramic video (*-PSCx3.zip).
Significantly, the far-field 8-channel speech not only records each participant's spoken contributions but also captures the rich tapestry of background sounds, such as clicking, keyboard typing, door opening and closing, and fan sounds. In contrast, the near-field mono speech effectively reduces interference from unwanted sources while maintaining a remarkable signal-to-noise ratio (SNR) greater than 15 dB. The panoramic camera captures the entire meeting room, including each participant's facial expressions, body movements, and the visual focus of attention, providing a rich source of multimodal cues for analysis.
| Set | Train | Dev | Eval | Total | 
| Duration | 118.80 | 3.24 | 3.11 | 125.15 | 
| Room | 15 | 4 | 4 | 23 | 
| Participant | 233 | 25 | 28 | 286 | 
| -Male | 115 | 13 | 14 | 142 | 
| -Female | 118 | 12 | 14 | 144 | 
| Avg. Duration | 2832.37 | 1943.51 | 1863.23 | 2763.98 | 
| Avg. Length | 13085.13 | 7557.83 | 7624.00 | 12680.65 | 
| Avg. Turns | 463.33 | 118.83 | 321.67 | 445.43 | 
| Avg. Speakers | 5.57 | 5.00 | 5.50 | 5.55 | 
If you find this corpus useful in your research, please cite the following papers:
MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization
@inproceedings{chen2025misp,
title = "{MISP - Meeting}: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization",
author = "Chen, Hang and Yang, Chao-Han Huck and Gu, Jia-Chen and Siniscalchi, Sabato Marco and Du, Jun",
booktitle = "Proceedings of the 63st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2025",
publisher = "Association for Computational Linguistics",
pages = "1--14"}
Downloads
This dataset is available under the 
license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus
Updated AVSR corpus of MISP2021 challenge
Introduction
The updated Audio-Visual Speech Recognition (AVSR) corpus of MISP2021 challenge is a large-scale audio-visual Chinese conversational corpus consisting of 141h audio and video data collected by far/middle/near microphones and far/middle cameras in 34 real-home TV rooms. The corpus is the first distant multi-microphone conversational Chinese audio-visual corpus recorded in the home TV scenario, where several people are chatting in Chinese while watching TV and interacting with a smart speaker/TV in a living room.
Based on the corpus presented by the MISP2021 challenge, we make a dataset update including correcting the asynchronous sample in the training/development set and adding more data to increase the data diversity of the evaluation set. 
    
If you find this corpus useful in your research, please cite the following papers:    
    
If you already downloaded the corpus during the MISP2021 challenge and want to update the corpus, please re-download the zip with [Update] and [New] tags.
License
This dataset is available under license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus.
    
Downloads
Updated AVWWS database of MISP2021 challenge
Introduction
The updated audio-visual wake word spotting (AVWWS) database of the MISP2021 challenge covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and publicly available database for WWS. The wake word is "Xiao T Xiao T". A sample will be taken as a positive sample if the wake word is included, otherwise, it will be regarded as a negative sample. For each sample, at most one wake word is included. The data was divided into three subsets: Train, Development, and Evaluation. Dataset split follows speaker and room independence. Some noise data are also provided.
In order to facilitate data transmission, we have packed and compressed the audio and video data, and named them respectively according to the content. You can prepare data directories by extracting the downloaded zip compressed file. For more information about the directory structure, please refer to https://mispchallenge.github.io/mispchallenge2021/task1_data.html.
    
Papers submitted that make use of this database further need to cite [1] and [2]:     
    
License
This dataset is available under license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus.
If you already downloaded the corpus during the MISP2021 challenge and want to update the corpus, please check the training set and development set according to the latest file list [Train_dev_file_list.zip], and re-download the evaluation set again.
Downloads
AVSD&AVDR corpus of MISP2022 challenge
Introduction
In the MISP 2021 challenge, we released a large multi-microphone conversational audio-visual corpus. In the follow-up work, we have resolved authorization and storage issues to fully release the updated AVWWS and AVSR corpus of MISP2021 Challenge to all researchers. 
    For the MISP 2022 challenge, our training set is based on the updated MISP2021 AVSR corpus in the first section and supplies the RTTM/timestamp directories. The new development set is selected from the development and evaluation sets in the updated MISP2021 AVSR corpus. In addition, we will also release a new evaluation set, which has no duplicate speakers compared with other sets.
    
License
This dataset is available under license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus.
Downloads
AVTSE corpus of MISP 2023 challenge
Introduction
In the MISP 2021 challenge, we released a large multi-microphone conversational audio-visual corpus. In the follow-up work, we have resolved authorization and storage issues to fully release the updated AVWWS and AVSR corpus of MISP 2021 challenge to all researchers. In the MISP 2022 challenge, we released a new development set and a new evaluation set.
    For the MISP 2023 challenge, we focus on the Audio-Visual target speaker extraction (AVTSE) task. Our training set is based on the updated MISP2021 AVSR corpus in the first section and the development set is based on the MISP 2022 AVSD&AVDR corpus’ development set. In addition, we will add the middle-field videos to the development set. For evaluation set, we will add some new sessions, which focus on female dialogue scenarios.    
    
License
This dataset is available under license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus.
Downloads
—Training set
Please refer to the "Updated AVSR Corpus of MISP2021 Challenge" above for the download link of the training set. If you have already downloaded the training set, there is no need to repeat the download. We suggest that participants use near-field audio to simulate far-field audio to ensure complete alignment. We will provide a simulation solution in the baseline, and participants can also use different simulation methods or propose more innovative methods to solve this problem.
—Development set
For the far-field aduio and the transcription of development set, please refer to the " AVSD&AVDR corpus of MISP2022 challenge " above for the download link. In addition, we provide the middle-field video and the detection results of the lip. The download links for these two are as follows.
Wake word lipreading corpus of ChatCLR challenge
Introduction
For the wake word lipreading task of ChatCLR challenge, the training and development set are based on the far-field videos of the updated MISP2021 AVWWS dataset. For evaluation set, we will release new sessions, which include words that share similar lip shapes with the wake-up words, "Xiao T Xiao T", to amplify the difficulty.   
    
License
This dataset is available under license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus.
Downloads
—Training set & development set
Please refer to the far-field videos of training and development set in the "Updated AVWWS database of MISP2021 challenge" above for the download link. If you have already downloaded the training and development set, there is no need to repeat the download. 
Target speaker lipreading corpus of ChatCLR challenge
Introduction
For the target speaker lipreading task of ChatCLR challenge, we utilize the far-field videos from the training and development sets of MISP2021 AVSR dataset. Other official open-source video datasets can also be utilized. The development and evaluation sets contain 12 participants, whose videos are also included in the training set. Each speaker possesses approximately 30 minutes of data. Two-thirds of each person's data make up the development set, while the remaining data make up the evaluation set.
License
This dataset is available under license. By using the corpus, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the corpus, and you must immediately cease using the corpus.
Downloads
—Training set
Please refer to the far-field video of training and development set in the "Updated AVSR corpus of MISP2021 challenge" above for the download link. If you have already downloaded the training set, there is no need to repeat the download.