一区二区日本_久久久久久久国产精品_无码国模国产在线观看_久久99深爱久久99精品_亚洲一区二区三区四区五区午夜_日本在线观看一区二区

Skip to content

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

Notifications You must be signed in to change notification settings

ragavsachdeva/magi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

?

History

22 Commits
?
?

Repository files navigation

Magi, The Manga Whisperer

Static Badge Static Badge Dynamic JSON Badge Static Badge

Static Badge Static Badge Dynamic JSON Badge Static Badge

Table of Contents

  1. Magiv1
  2. Magiv2
  3. Datasets

Magiv1

Magi_teaser

v1 Usage

from transformers import AutoModel
import numpy as np
from PIL import Image
import torch
import os

images = [
        "path_to_image1.jpg",
        "path_to_image2.png",
    ]

def read_image_as_np_array(image_path):
    with open(image_path, "rb") as file:
        image = Image.open(file).convert("L").convert("RGB")
        image = np.array(image)
    return image

images = [read_image_as_np_array(image) for image in images]

model = AutoModel.from_pretrained("ragavsachdeva/magi", trust_remote_code=True).cuda()
with torch.no_grad():
    results = model.predict_detections_and_associations(images)
    text_bboxes_for_all_images = [x["texts"] for x in results]
    ocr_results = model.predict_ocr(images, text_bboxes_for_all_images)

for i in range(len(images)):
    model.visualise_single_image_prediction(images[i], results[i], filename=f"image_{i}.png")
    model.generate_transcript_for_single_image(results[i], ocr_results[i], filename=f"transcript_{i}.txt")

Magiv2

magiv2

v2 Usage

from PIL import Image
import numpy as np
from transformers import AutoModel
import torch

model = AutoModel.from_pretrained("ragavsachdeva/magiv2", trust_remote_code=True).cuda().eval()


def read_image(path_to_image):
    with open(path_to_image, "rb") as file:
        image = Image.open(file).convert("L").convert("RGB")
        image = np.array(image)
    return image

chapter_pages = ["page1.png", "page2.png", "page3.png" ...]
character_bank = {
    "images": ["char1.png", "char2.png", "char3.png", "char4.png" ...],
    "names": ["Luffy", "Sanji", "Zoro", "Ussop" ...]
}

chapter_pages = [read_image(x) for x in chapter_pages]
character_bank["images"] = [read_image(x) for x in character_bank["images"]]

with torch.no_grad():
    per_page_results = model.do_chapter_wide_prediction(chapter_pages, character_bank, use_tqdm=True, do_ocr=True)

transcript = []
for i, (image, page_result) in enumerate(zip(chapter_pages, per_page_results)):
    model.visualise_single_image_prediction(image, page_result, f"page_{i}.png")
    speaker_name = {
        text_idx: page_result["character_names"][char_idx] for text_idx, char_idx in page_result["text_character_associations"]
    }
    for j in range(len(page_result["ocr"])):
        if not page_result["is_essential_text"][j]:
            continue
        name = speaker_name.get(j, "unsure") 
        transcript.append(f"<{name}>: {page_result['ocr'][j]}")
with open(f"transcript.txt", "w") as fh:
    for line in transcript:
        fh.write(line + "\n")

Datasets

Disclaimer: In adherence to copyright regulations, we are unable to publicly distribute the manga images that we've collected. The test images, however, are available freely, publicly and officially on Manga Plus by Shueisha.

Static Badge Static Badge

Other notes

  • Request to download Manga109 dataset here.
  • Download a large scale dataset from Mangadex using this tool.
  • The Manga109 test splits are available here: detection, character clustering. Be careful that some background characters have the same label even though they are not the same character, see.

License and Citation

The provided models and datasets are available for academic research purposes only.

@InProceedings{magiv1,
    author    = {Sachdeva, Ragav and Zisserman, Andrew},
    title     = {The Manga Whisperer: Automatically Generating Transcriptions for Comics},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {12967-12976}
}
@misc{magiv2,
      author={Ragav Sachdeva and Gyungin Shin and Andrew Zisserman},
      title={Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names}, 
      year={2024},
      eprint={2408.00298},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00298}, 
}

About

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
主站蜘蛛池模板: 成人精品久久久 | 欧美成人精品在线观看 | 国产伊人精品 | 天天拍天天操 | 日日拍夜夜 | 亚洲欧美一区二区三区1000 | 成人毛片视频在线播放 | 午夜a级理论片915影院 | 福利视频大全 | 精品少妇一区二区三区日产乱码 | 视频一区在线播放 | 麻豆一区一区三区四区 | 欧美黑人一级爽快片淫片高清 | 国产高清免费视频 | 在线免费视频一区 | 精品国产不卡一区二区三区 | 成人小视频在线免费观看 | 亚洲国产一区二区三区在线观看 | 日韩爱爱网站 | 国产欧美日韩视频 | 亚洲欧美一区二区三区国产精品 | 亚洲精品久久久蜜桃 | av一级久久 | 天堂成人av | 亚洲男人天堂av | 精品一区久久 | 欧美日日 | 啪啪免费网 | 欧美日韩在线免费 | 成人一区在线观看 | www.婷婷亚洲基地 | 欧美一级大片免费看 | 国产精品久久久久久久午夜片 | 日本精品在线播放 | 不卡在线视频 | 亚洲一区二区三区欧美 | 水蜜桃亚洲一二三四在线 | 欧美国产中文 | 精品成人av | 亚洲黄色一级毛片 | 免费视频一区二区 |