r/ChatGPT 1d ago

Other Put a hat on it

Post image

GPT-40 image rendering continues to improve. I notice improvement in background composition.

100 Upvotes

14 comments sorted by

u/AutoModerator 1d ago

Hey /u/joeyhito!

If your post is a screenshot of a ChatGPT conversation, please reply to this message with the conversation link or prompt.

If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.

Consider joining our public discord server! We have free bots with GPT-4 (with vision), image generators, and more!

🤖

Note: For any ChatGPT-related concerns, email support@openai.com

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

14

u/SweatyBoi5565 1d ago

It's funny to see it changes minor details to the background.

3

u/TimesHero 1d ago

and the face

5

u/Prestigious-Tie-9267 1d ago

Like the Amazon logo on the box

1

u/utkohoc 1d ago

The collar too. Pretty interesting tbh.

3

u/Laserdollarz 1d ago

visualizing my cat's favorite hair style

3

u/zeaor 1d ago

Great, now dog hat and dog monocle making artists WILL STARVE!

4

u/MX010 1d ago

Why does GPT always create an entire new image instead of only affecting the parts needed so the original photo stays intact?

7

u/dCLCp 1d ago

You probably don't care about the actual answer and this will have been in vain, but it's fun for me as I am trying to understand these things better so maybe not entirely vain.

TLDR: The G in GPT but also the PT.

It is generating stuff based on pre-trained transformer data. If you gave a person this task they would actually hardly focus on the original picture at all. They would find a hat and spectacles somewhere, make a transparent selection and line it up. The person wouldn't actually be consuming or creating anything new just taking two things and mashing them together.

But just like with binary ones and zeros and how those get turned into the programs we use through many different layers of software development, the models we currently use don't think like we do at all. When we ask them to do something they have to go through a whole bunch of layers of abstraction otherwise they can't even understand what we are asking them to do. Then it has to turn those pixels in the image also into that same sort of probabilistic language. Once everything is in this language that it can actually understand that's where all the training comes in but really it's a case of you can't put the smoke back into the cigarette by that point.

By the time it is trying to make the image it is like those chefs that make vegan food that is supposed to look like regular food. It has taken some 3rd party substance and pulverized it into something that it will begin to shape into what you asked for. But the original substance you wanted it was just using as a very rough framework for what the thing you want looks like. By turning all of our words into probabilities it can't really do a 1:1 perfect reproduction. It has already lost the fidelity of the original input. It's all probablistic. At some point they will get better at fine tuning the probablities, and they already have gotten better at intelligently narrowing the field of focus that the systems look through which is why we can select regions where it needs to focus.

Did any of that make sense? You probably don't care lol.

1

u/MX010 21h ago

Sure thanks

1

u/Chuck_Vanderhuge 1d ago

Love it. Was the prompt that easy?

2

u/joeyhito 1d ago

Yes, I'm finding shorter more concise prompts generally deliver better image results. You can then modify with additional prompts if desired. This was done on an image upload which also makes the prompt shorter.

1

u/heatlesssun 1d ago

Columbo's dog.

1

u/utkohoc 1d ago

Crazy how they aren't simple additions. It changes the content of the photo a lot.