The magic of AI search

I just built yet another MCP experiment.

First I created a Python script to process .md files: chunk them, create embeddings, store everything in a PostgreSQL database.

Then I built an MCP server which can search the database both using semantic search (embeddings) and more traditional full text search as a fallback mechanism.

I find absolutely fascinating watching Claude interacting with this tool, because it’s not just about converting my request to a query, it’s the reasoning process which happens in order to find what it needs which is brilliant.

Let me show you an example:

Continue reading “The magic of AI search”

More MCP fun: Claude talks with ChatGPT

I started with a new idea this morning: create an MCP server that allows Claude to talk to the various OpenAI models.

Now I can ask Claude to ask any of the openAI models.

What I find more fascinating is how Claude is figuring out how to use these new tools. The key is in the description of the tool, the “manifest” that Claude gets when the server is initialised (and is probably injected at the beginning of every chat).

PS: if you want to try this at home, here’s the recipe.

As an example, here’s how the description of today’s MCP server looks like:

Continue reading “More MCP fun: Claude talks with ChatGPT”

Scraping Challenges and Open Standards

Following up what I posted recently about Scrape wars, I wrote a longer post for my company site. Reposting it here just for reference.

We’ve talked before about how everything you write should work as a prompt. Your content should be explicitly structured, easy for AI agents to read, interpret, and reuse. Yet, despite clear advantages, in practice we’re often stuck using workarounds and hacks to access valuable information.

Right now, many AI agents still rely on scraping websites. Scraping is messy, unreliable, and frankly a bit of a nightmare to maintain. It creates an adversarial relationship with companies who increasingly employ tools like robots.txt files, CAPTCHAs, or IP restrictions to block automated access. On top of that, major AI providers like OpenAI and Google are introducing built-in search capabilities within their ecosystems. While these are helpful, they ultimately risk creating a new layer of dependence. If content can only be efficiently accessed through these proprietary AI engines, we risk locking ourselves into another digital silo controlled by private platforms.

There is a simpler, proven, and immediately available solution: RSS. Providing your content via RSS feeds allows AI agents direct, structured access without complicated scraping. Our agents, for example, are already using structured XML reports from the Italian Parliament to effectively monitor parliamentary sessions. This is an ideal case of structured openness. Agents such as our Parliamentary Reporter Agent and the automated Assembly Report Agent thrive precisely because these datasets are publicly available, clearly structured, and easily machine-readable.

However, the reality isn’t always so positive. Other important legislative and governmental sites impose seemingly arbitrary restrictions. We regularly encounter ministries and other government websites that block access to automated tools or restrict access based on geographic location, even though their content is explicitly intended as public information. These decisions push us back into pointless workarounds or simply cut off access entirely, unacceptable when dealing with public information.

When considering concerns around giving AI models access to content, it’s essential to distinguish two different use cases clearly. One case is scraping or downloading massive amounts of data for training LLM models (this understandably raises concerns around copyright, control, and proper attribution). But another entirely different and increasingly crucial case is allowing AI agents access to content purely to provide immediate, useful services to users. In these scenarios, the AI is acting similarly to a traditional user, simply reading and delivering relevant, timely information rather than training on vast archives.

Building on RSS’s straightforwardness, we can take this concept further with more advanced open standards, such as MCP (Machine Content Protocol). Imagine a self-discovery mechanism similar to RSS feeds, but designed to handle richer, more complex datasets. MCP could offer AI agents direct ways to discover, interpret, and process deeper levels of information effortlessly, without the current challenges of scraping or the risk of vendor lock-in.

Of course, valid concerns exist about data protection and theft at scale (curiously the same concerns appeared back in the early RSS days, and even when the printing press first emerged… yet we survived). But if our primary goal is genuinely to share ideas and foster transparency, deliberately restricting access to information contradicts our intentions. Public information should remain public, open, and machine-readable.

Let’s avoid creating unnecessary barriers or new digital silos. Instead, let’s embrace standards like RSS and MCP, making sure AI agents are our partners, not adversaries, in building a more transparent and connected digital landscape.