r/ChatGPT • u/joeyhito • 1d ago
Other Put a hat on it
GPT-40 image rendering continues to improve. I notice improvement in background composition.
14
3
4
u/MX010 1d ago
Why does GPT always create an entire new image instead of only affecting the parts needed so the original photo stays intact?
7
u/dCLCp 1d ago
You probably don't care about the actual answer and this will have been in vain, but it's fun for me as I am trying to understand these things better so maybe not entirely vain.
TLDR: The G in GPT but also the PT.
It is generating stuff based on pre-trained transformer data. If you gave a person this task they would actually hardly focus on the original picture at all. They would find a hat and spectacles somewhere, make a transparent selection and line it up. The person wouldn't actually be consuming or creating anything new just taking two things and mashing them together.
But just like with binary ones and zeros and how those get turned into the programs we use through many different layers of software development, the models we currently use don't think like we do at all. When we ask them to do something they have to go through a whole bunch of layers of abstraction otherwise they can't even understand what we are asking them to do. Then it has to turn those pixels in the image also into that same sort of probabilistic language. Once everything is in this language that it can actually understand that's where all the training comes in but really it's a case of you can't put the smoke back into the cigarette by that point.
By the time it is trying to make the image it is like those chefs that make vegan food that is supposed to look like regular food. It has taken some 3rd party substance and pulverized it into something that it will begin to shape into what you asked for. But the original substance you wanted it was just using as a very rough framework for what the thing you want looks like. By turning all of our words into probabilities it can't really do a 1:1 perfect reproduction. It has already lost the fidelity of the original input. It's all probablistic. At some point they will get better at fine tuning the probablities, and they already have gotten better at intelligently narrowing the field of focus that the systems look through which is why we can select regions where it needs to focus.
Did any of that make sense? You probably don't care lol.
1
u/Chuck_Vanderhuge 1d ago
Love it. Was the prompt that easy?
2
u/joeyhito 1d ago
Yes, I'm finding shorter more concise prompts generally deliver better image results. You can then modify with additional prompts if desired. This was done on an image upload which also makes the prompt shorter.
1
•
u/AutoModerator 1d ago
Hey /u/joeyhito!
If your post is a screenshot of a ChatGPT conversation, please reply to this message with the conversation link or prompt.
If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.
Consider joining our public discord server! We have free bots with GPT-4 (with vision), image generators, and more!
🤖
Note: For any ChatGPT-related concerns, email support@openai.com
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.