Evaluation Accuracy Comparison.
Our dedicated evaluation model demonstrates a significant improvement in evaluation accuracy across all test points compared to the commonly used offline evaluation VLM, Qwen2.5-VL-72b.
@article{UniGenBench++,
title={UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation},
author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Bu, Jiazi and Zhou, Yujie and Xin, Yi and He, Junjun and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2510.18701},
year={2025}
}