BOOM: Beyond Only One Modality KIT's Multimodal Multilingual Lecture Companion Paper • 2512.02817 • Published Dec 2, 2025 • 2
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 Paper • 2406.16777 • Published Jun 24, 2024
OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion Paper • 2512.00234 • Published Nov 28, 2025 • 1
KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 Paper • 2505.13036 • Published May 19, 2025
ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition Paper • 2506.04635 • Published Jun 5, 2025
Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging Paper • 1908.02404 • Published Aug 7, 2019
Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models Paper • 2010.00198 • Published Oct 1, 2020
KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 Paper • 2505.13036 • Published May 19, 2025
Quality-Aware Decoding: Unifying Quality Estimation and Decoding Paper • 2502.08561 • Published Feb 12, 2025
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models Paper • 2411.18152 • Published Nov 27, 2024 • 1
Convoifilter: A case study of doing cocktail party speech recognition Paper • 2308.11380 • Published Aug 22, 2023 • 1
LibriS2S: A German-English Speech-to-Speech Translation Corpus Paper • 2204.10593 • Published Apr 22, 2022