upcarta
  • Sign In
  • Sign Up
  • Explore
  • Search

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

  • Paper
  • #ArtificialIntelligence
Oran Gafni
@OranGafni
(Author)
Adam Polyak
@AdamPolyak
(Author)
arxiv.org
Read on arxiv.org
1 Recommender
1 Mention
Recent text-to-image generation methods provide a sim- ple yet exciting conversion capability between text and im- age domains. While these methods have incrementally im- proved the... Show More

Recent text-to-image generation methods provide a sim- ple yet exciting conversion capability between text and im- age domains. While these methods have incrementally im- proved the generated image fidelity and text relevancy, sev- eral pivotal gaps remain unanswered, limiting applicabil- ity and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene, (ii) introducing elements that substantially improve the tok- enization process by employing domain-specific knowledge over key image regions (faces and salient objects), and (iii) adapting classifier-free guidance for the transformer use case. Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fi- delity images in a resolution of 512 × 512 pixels, signifi- cantly improving visual quality. Through scene controlla- bility, we introduce several new capabilities: (i) Scene edit- ing, (ii) text editing with anchor scenes, (iii) overcoming out-of-distribution text prompts, and (iv) story illustration generation, as demonstrated in the story we wrote.

Show Less
Recommend
Post
Save
Complete
Collect
Mentions
See All
Yann LeCun @ylecun · Jul 14, 2022
  • Post
  • From Twitter
Make-A-Scene! An *interactive* and *controllable* image generation system that produces a nice picture from a text description and a rough sketch. Paper here:
  • upcarta ©2026
  • Home
  • About
  • Terms
  • Privacy
  • Cookies
  • @upcarta