The “think of a number” fallacy

Some time a go a colleague commenting on the idea of iterative prompting, suggested to ask GPT to “think about something” and then make a decision on what to write or not to write.

The problem with this approach is that a session with an LLM doesn’t really have a memory outside the actual text being created by the chat, consequently it cannot “keep something in mind” while completing other tasks.

But it can pretend it does.

To test this, you can ask to a LLM to “think of a number, but don’t tell me”. At the time of this writing most models will respond by confirming that they have thought of a number. Of course they haven’t... but because they are trained to mimic human interactions, they are pretending they are.

This is something to always keep in mind while prompting.

For example, it is not effective to prompt a system to “make a list and only show me the part matching a criteria”, but you can request to print the full output and then generate a final list (“print the list, then update it with the criteria”).

GroceriesGPT

A friend this morning shared a list of vegetables, noting how hard it is to eat 30 different ones in the same week.

I immediately turned to my AI chatbot to ask to create a list of commonly eaten vegetables, and of course I got a very good one.

At that point I thought that it would be nice to add that list to my next grocery order on Ocado.

And this is where the magic ended.

My chatbot doesn’t talk to the Ocado app. And I actually use more than one bot, sometime I go with ChatGPT, sometime I go with Claude, they are both good and continuously improving and I like to pit them against each other.

ChatGPT has a plug-in architecture which potentially would allow to connect with other applications creating custom GPTs, but so far I haven’t seen any particularly good application. And what would be the idea there? That Ocado would have to build a custom GPT? And what about other chatbots? I don’t really want to be siloed again. I’m happy to pay for services, even Google, but leave me free to connect.

Meanwhile I’m sure that somebody at Ocado is already thinking on how to integrate an AI in their app (if you aren’t, call me), and while this will be a nice feature to have, it will be yet another AI agent unable to talk with my other agents.

Maybe the solution is similar to what Rabbit appears to be working on: teach AI to use UI. Avoid altogether the challenge of getting companies and engineers to agree on open standards and just teach AIs to use shitty incompatible interfaces of our apps.

AI interoperability might be one of the most interesting future problems that we will face.

I want the AIs I pay for to collaborate, not to compete.

(Not) too old for this *

By the end of this year it will be 30 years since I registered my first domain name (warning, your browser might throw a hissy fit, I didn’t bother to get a certificate to secure that site, it’s just there for nostalgic reasons, not worth the hassle).

Yesterday I was trying to get a colleague to deploy a simple service that would allow us to save a file on a server and download the file from the server. Apparently it’s much harder in today’s sophisticated cloud environments than it used to be.

Speaking of clouds, I did some house cleaning on various accounts, domains, mailboxes, cloud services today. I got lost multiple times in the complexities of these services (in particular I feel a new and warm form of hate for google cloud). When corporate meets software this is what we get.

I’m not really complaining, every day I’m talking with a chatbot who understands me better than most souls. It’s magic.

Yet there are moments when I miss a world when early technologies were simpler to master. There was some stuff I knew almost everything about.

But at the same time I find amazing coming to work every morning and inventing new things. For the complicated stuff now I just ask ChatGPT ;)

With great responsibilities

Having just started a company that primarily deals with large language models I’m occasionally thinking about the responsibilities that we have when we introduce a new AI agent in the digital space.

Besides preservation of the human species, a good rule that I think we should give ourselves is “avoid bullshit”, and while this rule must certainly be true for any human activity, I think it’s extremely important when you are dealing with the equivalent of BS thermo-nuclear devices.

I’m still working on my list, this is as far as I got.

Every time one of our AI agents produces an output we should ask ourselves:

  • does this text improves the life of the intended recipient?
  • will it be delivered only to the intended recipient (and not to a whole bunch of innocent bystanders)?
  • is it as efficient as it can be in the way it uses language and delivers its message?

If these minimum parameters are not met, the agent should be destroyed.

As with everything else, AI is not the cause of this, there has been plenty of wasteful content well before a computer could start mimicking the output of a sapiens. And because these LLM models have been trained on a lot of this useless noise, they are extremely good at generating more.

So even before you worry if AI can get the wrong president elected or robots can terminate humanity, just make sure that you are not accidentally leaving the BS tap open when you leave.