consendpix2pix：学习遵循图像编辑说明

论文标题

consendpix2pix：学习遵循图像编辑说明

InstructPix2Pix: Learning to Follow Image Editing Instructions

论文作者

Brooks, Tim, Holynski, Aleksander, Efros, Alexei A.

论文摘要

我们提出了一种从人类说明中编辑图像的方法：给定输入图像和一个书面指令，该指令告诉模型该怎么做，我们的模型遵循这些说明来编辑图像。为了获得此问题的培训数据，我们结合了两个大型审慎模型（GPT-3）和文本对图像模型（稳定扩散）的知识，以生成大量图像编辑示例的数据集。我们的条件扩散模型，指GuroxentPixPix对我们生成的数据进行了训练，并在推理时概括了真实的图像和用户写入的说明。由于它在正向通行证中执行编辑，并且不需要以示例进行微调或反转，因此我们的模型在几秒钟内快速编辑图像。我们显示了各种输入图像和书面说明集合的引人入胜的编辑结果。

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题