Ok it’s been just over a year since Dall-E launched. Dall-E is OpenAI’s image generation tool that is now available within the paid version of ChatGPT.
Back in the beginning on 2023 it was a bit of a joke about how bad rendering of fingers and hands was in image generators.

But by the middle of 2023 rendering of fingers and hands was ok.
It also somewhat sucked at horses.


There were a couple of issues here:
- Horses have “long and spindly” legs which image generators struggled with. Hopefully this will get fixed in 2024. While it’s not shown in these example to get 5 legged horses was not uncommon.
- Most horses generated are what a non horse person would say is a horse. Most horses generated were cartoonish and slab faced and bodied. Horses have a very subtle refinement to their conformation that horse people immediately recognise if it’s even slightly wrong. The sample images look cartoonish and out of proportion.
This image shows a cartoonish horse. Fine for most applications – but a horsey person immediately picks the conformation faults of the animal. Also I asked for a buckskin and got a palomino.

ANYWAY
Dall-E is also pretty average at generating text.
I received this Meme:

I provided this prompt within ChatGPT 4:
Prompt: Can you recreate this image to say “horses” and not “books”? use “horse” for “book” of course
It could not. Here is the result:

There are few issues here:
- The text is nonsense.
- The flowchart has not been kept – Dall-E created a new one.
- The cropping is non optimal.
The graphics are a nice touch.
Lets try again:
Prompt: ok you need to try again please

Same issues as before – but I admit I like the graphics more.
One more go:
Prompt: Take the original image and only change the word “book” to “horse” and the word “books” to “horses”. Keep the font size and font face. This is the only change I require.
Result is still average:

Welp – looks like we are at the limits of Dall-E. Text is a real problem for it.
So what I’ll do is try this experiment again in 2024 – and suss another blog article for youse – so you can see Dall-E and ChatGPT improve over time.