Evaluation Accuracy Comparison.
Our dedicated evaluation model demonstrates a significant improvement in evaluation accuracy across all test points compared to the commonly used offline evaluation VLM, Qwen2.5-VL-72b.
@article{UniGenBench++,
title={UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation},
author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Bu, Jiazi and Zhou, Yujie and Xin, Yi and He, Junjun and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and others},
journal={arXiv preprint arXiv:2510.18701},
year={2025}
}