一区二区日本_久久久久久久国产精品_无码国模国产在线观看_久久99深爱久久99精品_亚洲一区二区三区四区五区午夜_日本在线观看一区二区

Mini-Gemini:

Mining the Potential of Multi-modality Vision Language Models

The Chinese University of Hong Kong

Updates: Mini-Gemini is comming! We release the paper, code, data, models, and demo for Mini-Gemini.

Abstract

In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current framework with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpass the developed private models.



Model

The framework of Mini-Gemini is conceptually simple: dual vision encoders are utilized to provide low-resolution visual embedding and high-resolution candidates; patch info mining is proposed to conduct patch-level mining between high-resolution regions and low-resolution visual queries; LLM is utilized to marry text with images for both comprehension and generation at the same time.

BibTeX


@article{li2024minigemini,
  title={Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models},
  author={Li, Yanwei and Zhang, Yuechen and Wang, Chengyao and Zhong, Zhisheng and Chen, Yixin and Chu, Ruihang and Liu, Shaoteng and Jia, Jiaya},
  journal={arXiv preprint arXiv:2403.18814},
  year={2024}
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Examples









主站蜘蛛池模板: 精品国产乱码久久久 | 911精品国产| 日韩中文字幕在线播放 | 亚洲精品无 | 中文字幕一区二区三区四区五区 | 国产精品视频在线播放 | 国产中文在线观看 | 亚洲视频在线看 | 日韩在线视频观看 | 亚洲精品电影在线 | 亚洲国产成人精品女人久久久野战 | av影片在线 | 国产在线观看 | 天堂视频一区 | 精品中文在线 | 在线观看成人免费视频 | 国产精品国产成人国产三级 | 国产精品国产a | 荷兰欧美一级毛片 | 黄色欧美视频 | 国产精品国产三级国产aⅴ中文 | 国产欧美日韩一区 | 日韩精品一区二区三区久久 | 国产 日韩 欧美 中文 在线播放 | www.888www看片| 国产91亚洲精品一区二区三区 | 日韩综合在线 | 亚洲午夜在线 | 一区二区精品电影 | 日韩免费视频一区二区 | 男人天堂手机在线视频 | 日韩在线中文 | tube国产| 91久久| 亚洲精品成人在线 | 丁香五月网久久综合 | 激情一区二区三区 | 日韩精品无码一区二区三区 | 欧美精品久久久久久久久久 | 亚洲成人高清 | 涩涩片影院|