Playing with Claude skills

Using skills we can capture *why* software components are built, not just the *how*, giving developers better context when working with unfamiliar code. This isn’t about replacing developers, it’s about empowering them. We’re now exploring ways to capture skills from conversations, documents, and the natural flow of work.

For the last couple of weeks I have been playing with Claude skills.

It’s a cool way to package knowledge on how to do stuff in a prompt which can be retrieved any time your agent needs it. They work with Claude, but you can easily integrate them in any LLM-based process.

My first experiment was building a skill about our company. Read the website, read some internal documents, read technical documentation, distil a context that can be used any time I’m writing or thinking about business stuff. But the most interesting aspect is that everybody on our team can now use the same context for whatever work they are doing.

The next experiment was to build a skill to create Spritz agents (there’s a very good skill-building skill on Claude, which helps a lot with the building and packaging of skills). I showed Claude the blueprint for agents that we use every day, then showed some fully developed agents.

Then I tried to build a simple “hello world” agent from scratch with this prompt:

Build a Hello World Spritz agent. It should ask for the user name, then use the Anthropic API to generate a greeting. Deploy the agent on AWS using the CLI and test it. Ask me for an API key when you are ready.

I was able to obtain a working agent in about 10 minutes, but it took a few nudges here and there where the skill didn’t cover details.

At the end of this process I prompted:

Based on the experience of this job, update the skill file so next time we will be able to complete the task without obstacles. Do not include in the skill any specific information about this agent or my development context.

The second time it worked end to end.

I have since tried to build a bunch of different agents, always adding more details and nuances to the skills.

This is not (just) about production

Of course this is not about replacing developers, it’s about empowering them. The agents I build will not be used in a production environment; they are mostly proof of concept.

Using skills (or some similar prompting technique) we can capture why various software components are built, not just the how, allowing developers to have a much better context when they need to interact with code they have not created, or even when they go back to a project after a while.

They are an amazing teaching tool to explain to others how things work.

For now we have simply started a GitHub repository with the skills we have built so far. It’s easy to ask Claude, ChatGPT or any other tool to find and retrieve skills from the repo and use them. Now we are figuring out new ways to capture skills from the flow of work we do, from conversations we have, from documents we create.

Yet another step towards an interesting future.

The magic of AI search

I just built yet another MCP experiment.

First I created a Python script to process .md files: chunk them, create embeddings, store everything in a PostgreSQL database.

Then I built an MCP server which can search the database both using semantic search (embeddings) and more traditional full text search as a fallback mechanism.

I find absolutely fascinating watching Claude interacting with this tool, because it’s not just about converting my request to a query, it’s the reasoning process which happens in order to find what it needs which is brilliant.

Let me show you an example:

Continue reading “The magic of AI search”

And here’s the recipe

I’m not confident enough in the tools I built this week to share them around just yet. As long as they run on my Mac, I’m happy, but I can’t really take responsibility for how they’d work for anyone else.

Still, while I’m not serving up the dish, I’m definitely happy to share the recipe!

If you plug this prompt into Claude or ChatGPT, you’ll get pretty close to what I’ve got running. Then ask how to build it and how to configure Claude and you should be good to go. Good luck, and let me know how it goes.

(I think that sharing prompts is an act of love.)

Continue reading “And here’s the recipe”

More MCP fun: Claude talks with ChatGPT

I started with a new idea this morning: create an MCP server that allows Claude to talk to the various OpenAI models.

Now I can ask Claude to ask any of the openAI models.

What I find more fascinating is how Claude is figuring out how to use these new tools. The key is in the description of the tool, the “manifest” that Claude gets when the server is initialised (and is probably injected at the beginning of every chat).

PS: if you want to try this at home, here’s the recipe.

As an example, here’s how the description of today’s MCP server looks like:

Continue reading “More MCP fun: Claude talks with ChatGPT”

Mem’ries… light the corners of my mind

For the last few days, I’ve had access to the “Reference Chat History” feature in ChatGPT (I think it had been available for a while in the US, but it just landed on my account in the UK).

Wow… what a change!

I was putting together a page to describe the various tools we’ve been working on, and I just tried randomly asking ChatGPT to insert a description of “Gimlet” or “Old Fashioned”: it just did it. No context necessary, no links, no pages. It was just there, part of the memory I share with the app.

I do continuously switch between AI tools based on which one I think can perform better on any given task – or sometimes just to compare how they perform – and this feature makes ChatGPT more attractive: it has more reusable context than any of the other tools.

It’s quite likely that all other tools will develop similar features, but this will mean trying to silo users. I’ll tend to go where most of my memories are, and I won’t be switching and leaving all my memories behind.

My memories.

Hopefully a shared standard for memories (maybe MCP?) will soon emerge, and we won’t end up siloed again.

The “think of a number” fallacy

Some time a go a colleague commenting on the idea of iterative prompting, suggested to ask GPT to “think about something” and then make a decision on what to write or not to write.

The problem with this approach is that a session with an LLM doesn’t really have a memory outside the actual text being created by the chat, consequently it cannot “keep something in mind” while completing other tasks.

But it can pretend it does.

To test this, you can ask to a LLM to “think of a number, but don’t tell me”. At the time of this writing most models will respond by confirming that they have thought of a number. Of course they haven’t... but because they are trained to mimic human interactions, they are pretending they are.

This is something to always keep in mind while prompting.

For example, it is not effective to prompt a system to “make a list and only show me the part matching a criteria”, but you can request to print the full output and then generate a final list (“print the list, then update it with the criteria”).