一区二区日本_久久久久久久国产精品_无码国模国产在线观看_久久99深爱久久99精品_亚洲一区二区三区四区五区午夜_日本在线观看一区二区

Mini-Gemini:

Mining the Potential of Multi-modality Vision Language Models

The Chinese University of Hong Kong

Updates: Mini-Gemini is comming! We release the paper, code, data, models, and demo for Mini-Gemini.

Abstract

In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current framework with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpass the developed private models.



Model

The framework of Mini-Gemini is conceptually simple: dual vision encoders are utilized to provide low-resolution visual embedding and high-resolution candidates; patch info mining is proposed to conduct patch-level mining between high-resolution regions and low-resolution visual queries; LLM is utilized to marry text with images for both comprehension and generation at the same time.

BibTeX


@article{li2024minigemini,
  title={Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models},
  author={Li, Yanwei and Zhang, Yuechen and Wang, Chengyao and Zhong, Zhisheng and Chen, Yixin and Chu, Ruihang and Liu, Shaoteng and Jia, Jiaya},
  journal={arXiv preprint arXiv:2403.18814},
  year={2024}
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Examples









主站蜘蛛池模板: www四虎| 免费成人深夜夜行网站 | 日韩在线中文 | 精品一区三区 | 日韩一区不卡 | 黄色小视频在线观看 | 黄色一级视频免费看 | av在线一区二区三区 | 九九九精品视频 | 日本精品在线视频 | 99视频免费观看 | 久久国产一区 | 国产成人三级 | 婷婷俺也去 | 国产一区二区免费在线观看 | 在线观看国产小视频 | 国产一区二区三区免费 | 四虎成人在线 | 中文字幕的 | 久久综合影院 | 国产成人精品三级麻豆 | 国产三级午夜理伦三级 | 在线观看二区 | 91视频在线| 日韩精品在线观看视频 | 黄色成人av| av日韩在线播放 | 日韩精品黄 | 五月伊人网 | 成年网站在线观看 | 欧美高清视频在线观看mv | 日本少妇久久 | 日韩精品视频一区二区三区 | 国产伦精品一区二区三区四区 | 成人一区在线观看 | 神马福利视频 | 欧美成人精品激情在线观看 | 久久久久久国产 | 国产精品一品二区三区的使用体验 | 色一情一乱一乱一区91av | 中文字幕在线观看网址 |