⚔️ IDEA-Bench ⚔️ : How Far are Generative Models from Professional Designing?

📜 Rules

Choose the better one from two anonymous models.
Click "New Round" to start a new round.
After the voting ends, the model name will be displayed and the voting selection cannot be changed.

⚠️ Data Collection Consent

Your votes will be collected for research purposes only.
By using this service, you agree to the collection of your votes for research purposes.
Your data will be anonymized and will not be used for commercial purposes.

🏆 Arena Elo

Find out the best model for professional level image processing tasks! Welcome to upload your own model generation results!

👇 Voting now!

name	description	creator	upload time
ChatDiT	A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers.	Tongyi Lab	2024-12-23 15:49
GPT-4o + FLUX.1 [dev]	A new open-source image generation model developed by Black Forest Labs. Use GPT-4o for prompt rephrasing.	Black Forest Labs	2024-12-23 15:50
GPT-4o + Stable Diffusion 3 Medium	A Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. Use GPT-4o for prompt rephrasing.	Stability AI	2024-12-24 15:39
GPT-4o + PixArt-Sigma	PixArt-Sigma consists of pure transformer blocks for latent diffusion: It can directly generate 1024px, 2K and 4K images from text prompts within a single sampling process. Use GPT-4o for prompt rephrasing.	Huawei Noah's Ark Lab	2024-12-24 15:39
GPT-4o + DALLE-3	DALL-E 3 is the newest text-to-image generation model from OpenAI. Use GPT-4o for prompt rephrasing.	OpenAI	2024-12-24 15:39
GPT-4o + Emu2	A generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Use GPT-4o for prompt rephrasing.	BAAI	2024-12-24 15:39
GPT-4o + OmniGen	OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, and image-conditioned generation. Use GPT-4o for prompt rephrasing.	BAAI	2024-12-24 15:39

Input Images

Model A

Model B

⚔️ IDEA-Bench ⚔️ : How Far are Generative Models from Professional Designing?

| GitHub | Paper | Dataset |

📜 Rules

Choose the better one from two anonymous models.
Click "New Round" to start a new round.
After the voting ends, the model name will be displayed and the voting selection cannot be changed.

⚠️ Data Collection Consent

Your votes will be collected for research purposes only.
By using this service, you agree to the collection of your votes for research purposes.
Your data will be anonymized and will not be used for commercial purposes.

🏆 Arena Elo

Find out the best model for professional level image processing tasks! Welcome to upload your own model generation results!

👇 Voting now!

Dropdown

name	description	creator	upload time
ChatDiT	A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers.	Tongyi Lab	2024-12-23 15:49
GPT-4o + FLUX.1 [dev]	A new open-source image generation model developed by Black Forest Labs. Use GPT-4o for prompt rephrasing.	Black Forest Labs	2024-12-23 15:50
GPT-4o + Stable Diffusion 3 Medium	A Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. Use GPT-4o for prompt rephrasing.	Stability AI	2024-12-24 15:39
GPT-4o + PixArt-Sigma	PixArt-Sigma consists of pure transformer blocks for latent diffusion: It can directly generate 1024px, 2K and 4K images from text prompts within a single sampling process. Use GPT-4o for prompt rephrasing.	Huawei Noah's Ark Lab	2024-12-24 15:39
GPT-4o + DALLE-3	DALL-E 3 is the newest text-to-image generation model from OpenAI. Use GPT-4o for prompt rephrasing.	OpenAI	2024-12-24 15:39
GPT-4o + Emu2	A generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Use GPT-4o for prompt rephrasing.	BAAI	2024-12-24 15:39
GPT-4o + OmniGen	OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, and image-conditioned generation. Use GPT-4o for prompt rephrasing.	BAAI	2024-12-24 15:39

Input Images

Model A

Model B

🏆 IDEA-Bench Leaderboard

| Code | Dataset | Page |

Total #models: 7(anonymous). Total #votes: 160. Last updated: 2025-02-17 19:29:44 PST. (Note: Only anonymous votes are considered here. Check the full leaderboard for all votes.)

Rank	🤖 Model	⭐ Arena Elo	📊 95% CI	🗳️ Votes	Organization	License
1	GPT-4o + Stable Diffusion 3 Medium	1059	+56/-52	76	Huawei Noah's Ark Lab	FLUX.1 [dev] Non-Commercial License

Rank	🤖 Model	⭐ Arena Elo	📊 95% CI	🗳️ Votes	Organization	License
1	GPT-4o + FLUX.1 [dev]	1059	+56/-52	76	Black Forest Labs	FLUX.1 [dev] Non-Commercial License
2	GPT-4o + Stable Diffusion 3 Medium	1037	+76/-68	37	Stability AI	Stability AI Community License
3	GPT-4o + PixArt-Sigma	1037	+58/-82	46	Huawei Noah's Ark Lab	CreativeML Open RAIL++-M License
4	ChatDiT	1033	+84/-78	65	Tongyi Lab	MIT License
5	GPT-4o + DALLE-3	1024	+89/-75	30	OpenAI	OpenAI Terms of Use
6	GPT-4o + Emu2	911	+82/-96	33	BAAI	Apache License 2.0
7	GPT-4o + OmniGen	899	+84/-86	33	BAAI	MIT License

Total #models: 7(full:anonymous+open). Total #votes: 208. Last updated: 2025-02-17 19:36:44 PST.

Rank	🤖 Model	⭐ Arena Elo (anony)	⭐ Arena Elo (full)	🗳️ Votes	Organization	License
1	GPT-4o + Stable Diffusion 3 Medium	1059	1051	114	Huawei Noah's Ark Lab	FLUX.1 [dev] Non-Commercial License

Rank	🤖 Model	⭐ Arena Elo (anony)	⭐ Arena Elo (full)	🗳️ Votes	Organization	License
1	GPT-4o + FLUX.1 [dev]	1059	1051	92	Black Forest Labs	FLUX.1 [dev] Non-Commercial License
2	GPT-4o + PixArt-Sigma	1037	1015	33	Huawei Noah's Ark Lab	CreativeML Open RAIL++-M License
3	GPT-4o + Stable Diffusion 3 Medium	1037	1013	42	Stability AI	Stability AI Community License
4	GPT-4o + DALLE-3	1024	1006	40	OpenAI	OpenAI Terms of Use
5	ChatDiT	1033	1005	114	Tongyi Lab	MIT License
6	GPT-4o + Emu2	911	890	52	BAAI	Apache License 2.0
7	GPT-4o + OmniGen	899	868	43	BAAI	MIT License

We are still collecting more votes on more models. The ranking will be updated very fruquently. Please stay tuned!

Figure 1: Fraction of Model A Wins for All Non-tied A vs. B Battles

Plot

Figure 2: Battle Count for Each Combination of Models (without Ties)

Plot

Figure 3: Bootstrap of Elo Estimates (1000 Rounds of Random Sampling)

Plot

Figure 4: Average Win Rate Against All Other Models (Assuming Uniform Sampling and No Ties)

Plot

Acknowledgment

Our codebase is built upon FastChat and GenAI-Arena.

About Us

This is a project from Tongyi Lab.

Contributors:

Chen Liang, Lianghua Huang, Jingwu Fang, Huanzhang Dou, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Junge Zhang, Xin Zhao, Yu Liu.

Contact:

Email: [email protected] (Chen Liang)

Sponsorship

We are keep looking for sponsorship to support the arena project for the long term. Please contact us if you are interested in supporting this project.

Acknowledgment

Our codebase is built upon FastChat and GenAI-Arena.