Aslan Farboud
Back to all posts

Lowering the Barrier: OpenAI Image Generation, Sesame’s Human-Like Voice, and the Rise of AI Tools

A reflection on how powerful generative tools like OpenAI’s image model, Google’s Gemini 2.5 Pro, and Sesame’s speech tech are redefining business content creation, marketing, and customer service.

4 min read
Lowering the Barrier: OpenAI Image Generation, Sesame’s Human-Like Voice, and the Rise of AI Tools
So as many of you might know, OpenAI’s new text-to-image model has clearly raised the bar in terms of reliability and the exceptional quality of generating exactly what you describe — it’s remarkable.

It’s really exciting, as this will further the industry of being able to generate any type of image, where the only real limitation is your imagination.

I think the implications of this are worth exploring. The first thing that came to mind is how this will substantially lower the bar for content and marketing generation for SME businesses — think about it. Small and medium-sized businesses no longer need to hire a team of specialized content and marketing people. We’re moving into a world where you only need a few, or even just one, AI-aware content/marketing strategist — or even just a business leader with vision — and they can use AI to generate images and blog posts that align with that vision. Whether it’s every few days, weekly, or whenever — this can be done in just a few hours.

For example, I’ve been using this model by taking my parents’ products from their business and asking it to place those products in various scenes — like a person surfing and holding the product, or a couple on a beach enjoying it. Before, you’d have to go to a specialist to craft a scenario: source the product, hire people, create a scene, get it photographed or filmed… but now, anyone who’s curious and willing to explore AI can do this. It’s wild.

It’s so fascinating to see how exponentially text-to-image models have grown, and I genuinely think the positive implications of this are massive. I believe AI, in general, will benefit small to medium-sized businesses the most — mainly because of their flexible processes, ability to pivot, and, importantly, because they already operate with lean teams. AI allows businesses to scale with demand without employee constraints. That might sound worrying at first, but I think this will be a net positive.

I suppose the jury’s still out, but in many ways, AI feels like the Gutenberg press in the 15th century, the mechanical reaper in the 1830s, the automobile in the early 1900s, or more recently, the internet. Imagine how people felt during those transitions.

With the mechanical reaper, tons of people worked in agriculture, and suddenly you only needed one trained person and a horse to do the job of dozens. People were worried — understandably so. Where would all the workers go? What would they do?

It might be my naivety — or maybe AI is the “final boss” — but I think we’ll see a similar trajectory here.



Anyway, moving on — how cool has Google’s new model Gemini 2.5 Pro been? I’ve mainly been using it for programming, and honestly, I can’t even explain how good it is.

But the most interesting part? It’s free. It’s literally free, with generous rate limits.

Yet another example of how the goalposts are constantly being pushed forward in terms of what high-quality LLMs can do. It feels like every few weeks, a new model drops that raises the bar even more. Super exciting.



Lastly, I wanted to talk about Sesame. For those who haven’t heard, Sesame is a company specializing in text-to-voice and conversational speech models (CSMs). Please — if you haven’t checked out their demo, go now:
👉 https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice

It’s unbelievable — it’s almost indistinguishable from a real human voice.

They even released a smaller, non–fine-tuned open-source model:
👉 https://github.com/SesameAILabs/csm

While it’s not as good as their fine-tuned demo, it’s still really solid — especially considering you can run it on most modern computers. The computational requirements have been significantly reduced, which once again lowers the barrier to entry.

I literally cloned my own voice using their open-source model on my laptop!

I’m curious how this will disrupt companies like ElevenLabs and similar players, where voice cloning and high-quality TTS models are still pretty expensive, especially at scale.

But all this to say — I think the big implication here is that in 2–5 years, we probably won’t have humans answering phones in businesses anymore.

Think product support, helpdesks, booking appointments — maybe even lawyers or tax consultants? Who knows. But what is certain is that AI agents, indistinguishable from real humans and trained on your business, will be available 24/7, for pennies.

That feels inevitable.

Anyway, lots to talk about. Some very exciting stuff is happening right now.

Take care, all.