GPT-4 with (Image, Text) -> Text is being released in the coming weeks, really good write up below:
https://twitter.com/DrJimFan/status/1706478482296021344
The potential of Multi-modal to open up an incredible new set of use-cases is massive.
This was announced months ago and many of us have been waiting patiently for it.
I'm a bit unclear if this will be available in the API anytime soon or whether it is only Chat.
I'll check with the OpenAI team and let you all know what I hear!
What OpenAI has communicated to me is "they are eventually working on making it available in an API but can not share a timeline."
It is not available outside the Chat.