No Humans Required

Abstract

We are pleased to introduce No Humans Required (NHR), a novel, fully automated pipeline engineered for the generation of high-fidelity, pixel-perfect image editing sequences. This innovation culminates in NHR-Edit, a new dataset meticulously constructed to facilitate the training and rigorous evaluation of advanced image editing models.

NHR addresses the inherent limitations of traditional image editing datasets, specifically mitigating the biases and inefficiencies associated with manual annotation processes. Our methodology leverages state-of-the-art Vision-Language Models (VLMs), Text2Image generators and Large Language Models (LLMs) and other advanced artificial intelligence paradigms with a bunch of euristics. These models are synergistically employed for both the programmatic synthesis of diverse image content and the subsequent stringent filtration of generated data. This automated paradigm ensures the production of an extensive array of complex and varied editing scenarios, thereby advancing the capabilities of AI in visual manipulation tasks.

The NHR Pipeline: Unleashing Automated Excellence

The NHR-Edit dataset is generated through a sophisticated, multi-stage pipeline designed for optimal efficiency and quality:

Starting Point Generation: To obtain high-quality input images, we generate them using the latest Flux1.schnell model as the SOTA T2I model and OpenAI O3 as one of the best LLMs. This results in high-quality, very detailed, and style-varied images with many details to edit in later steps.
Automated Sequence Continuance: At its core, the NHR pipeline utilizes LLMs to generate intricate editing instructions from initial prompt. These instructions, ranging from subtle adjustments to complex compositional changes, are then executed by advanced image editing models, creating the "edited" versions of the original images. This ensures a rich variety of editing tasks and corresponding ground-truth results.
Intelligent Filtration and Quality Control: A critical component of NHR is its robust filtration system. This stage employs a suite of SOTA models to assess the quality, accuracy, and coherence of the generated image-editing pairs. Models are tasked with verifying that the edits accurately reflect the instructions, ensuring visual fidelity, and flagging any artifacts or inconsistencies. This automated quality control guarantees a dataset free from human error and subjective interpretations.
Scalability and Diversity: By removing human intervention, the NHR pipeline offers unprecedented scalability. It can continuously generate new data, allowing for the creation of vast and diverse datasets tailored to specific research needs. This eliminates the bottleneck of manual annotation, accelerating the development and training of more robust and versatile image editing AI.

Proposed NoHumansRequired framework scheme.

Autonomous Dataset Generation Pipeline

A significant contribution of this work is the fully automatic pipeline for dataset generation, which ensures a scalable and high-quality source of data for training and evaluation. This pipeline involves the following steps:

Prompt Generation with LLM: A large language model is used to automatically generate diverse and complex input prompts for image editing tasks.
Initial Image Generation with Flux: An initial image is generated based on the LLM-generated prompt using the Flux1.dev model.
Image Editing with In-house DiT: The generated image is then edited by a proprietary model according to specific instructions, demonstrating its instructive editing capabilities.
Quality Assessment with Qwen Model: The edited images are assessed for pixel-perfect accuracy, instruction following, and aesthetics using the Qwen model, ensuring high standards for the dataset.
Strong Augmentations: For obtained triplets, we use inversion and bootstrap composition operations to expand number of available edits.
Backward consistency filter: After multiplying number of samples, we perform filtering, based on inversion or composition quality with filtering origin of augmented triplet to clean all the possible errors.

This automated approach allows for the creation of a vast and diverse dataset, crucial for developing and continuously improving robust models for Image-to-Image tasks.

General category group distribution.

Miscellaneous operations distribution.

Composite operations distribution (logarithmic scale).

Image style distribution; 'standard' stands for images with no explicit style.

Bagel-NHR-Edit

We release BAGEL-NHR-EDIT, a LoRA-tuned BAGEL variant that achieves higher scores than the base model on ImgEditBench and GEdit-Bench, demonstrating that fine-tuning on NHR-Edit improves the underlying editing model.

Results comparison between original Bagel-7B-MoT and BAGEL-NHR-EDIT on samples from ImgEdit and GEdit benches.

Publications

[1] Kuprashevich, M., Alekseenko, G., Tolstykh, I., Fedorov, G., Suleimanov, B., Dokholyan, V., & Gordeev, A. (2025). NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining. arXiv. [Paper]

BibTeX

@article{Layer2025NoHumansRequired,
    arxivId = {2507.14119},
    author = {Maksim Kuprashevich and Grigorii Alekseenko and Irina Tolstykh and Georgii Fedorov and Bulat Suleimanov and Vladimir Dokholyan and Aleksandr Gordeev},
    title = {{NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining}},
    year = {2025},
    eprint = {2507.14119},
    journal={arXiv preprint arXiv:2507.14119},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV},
    url = {https://arxiv.org/abs/2507.14119}
}