Realworld
State of the Art of AI for Development
Artificial intelligence is revolutionizing the way we develop digital products. In this post, we share the insights from the Runroom LAB, "State of the Art of AI for Development" with Rafa Gómez Casas and Javier Ferrer González, founders of Codely, where we explore the state of the art of AI applied to development.
Conclusions and Insights from Runroom LAB
1. AI is advancing rapidly. Order is needed!
A fundamental conclusion is that the speed of AI advancements is overwhelming; they launch things that “blow your mind” every day, making it difficult to stay updated
Javier and Rafa help us bring order and clarity to the feeling of being overwhelmed and not knowing how to assimilate so many new developments
- Role Change: The boom of agents (around March) caused a “mini-crisis” about what the developer's role is in the industry, forcing an attempt to stay “one step ahead”
- Forced Adoption: A radical change is observed in the business environment: previously, the use of AI was prohibited, and now it is mandatory to adopt it
- Accelerated Pace: Time in AI does not pass linearly; what happened in months is equivalent to several years in “AI age”
2. AI for Programming: Agents and Assistance
AI for programming is divided into three main areas: the ask mode (like ChatGPT), functionalities integrated into IDEs (such as contextual suggestions or generation of unit tests) and, the most advanced, AI AgentsAn agent is an
autonomous system that determines by itself when to stop executing and concludes that the assigned task has been completed, demonstrating agency.Integration Examples:
- They are being integrated directly into task management systems (like Linear with GitHub Copilot), where the agent takes the specification, tries to implement it, creates a Pull Request (PR), and provides feedback in the task manager itselfLow-Value Tasks:
- Agents are ideal for low-value or “zero romance” tasks (like adjusting copies or texts on a website). By speeding up these types of processes, they free up the developer's time.Interaction Context:
- An agent (like Cursor) is capable of taking context from conversation threads (like Slack), understanding the task (changing a text), identifying the correct file (a .json
), applying the change, and creating a PR, all autonomously.The choice of agent type depends on the need:
Local Agents:
- Ideal for medium/large functionalities where it is necessary to iterate and shape the solution quickly, keeping the AI on a short leash.Remote Agents (Background Agents):
- Allow the developer to not worry about execution once the plan has been specified (uploading it to Linear or Jira), allowing parallelization of other tasks.Parallel Models
- An advanced technique being implemented is executing the
same task in parallel with different models. This maximizes response time by allowing the selection of the result that is fastest or closest to the expected outcome. However, this also consumes more resources and tokens.3. The Evolution of Prompting and Rules
Improvement of Model Intelligence:
- Although guides like Anthropic's recommend very specific and verbose prompting, models are becoming smarter over time (e.g., GPT5 Codex).Model Trends:
- Models are becoming smaller and, at the same time, smarter.Rules:
- A solution to the problem of constantly repeating coding conventions (like “don't use verbose comments” or “don't use mox”) is rules. Cursor pioneered this, allowing rules to be defined in a Markdown file that the model applies dynamically if deemed relevant, based on the rule description.Standardization of Agents:
- The use of the agents.md
standard is advocated to specify these rules and conventions.Context: - It remains essential to optimize the use of context; although models support a large volume of tokens, the attention window is limited, and overloading it can lead to agents acting undesirably.4. AI Applied to Product Features (MCPs and RAG)
The second pillar of the talk focuses on how to use AI to develop features within the application itself.
Model Context Protocol (MCP)
MCP
is a protocol proposed by Anthropic that allows adding context (such as access to a calendar) to LLM models.Primitives:
- The MCP standard defines not only Tools (tools to perform actions, the most used) but also Resources (data lists for the model to read) and Prompts (shared libraries of prompts).Advanced Tools (Playwright):
- An MCP server from Playwright allows the LLM to inspect the DOM tree of a web page, launching a browser. This gives the model “eyes,” which is useful for generating acceptance tests for frontend or automating web interactions.Intelligence in Invocation:
- The model demonstrates intelligence by inferring and enriching the query before invoking the tool of the MCP server. This turns the LLM into a new entry point or distribution channel for business logic.RAG (Retrieval-Augmented Generation) and Vector Search
RAG
is the process of providing context to the system. The official guide recommends limiting the context to the most relevant, because saturating the LLM with irrelevant information (like a large list of courses) makes it hallucinate and invent things.Context Separation:
- To optimize costs (reductions of 5% to 10%) and caching, static content should be left in the instructions part (which can be cached) and variable content in the user's input part.Semantic (Vector) Search:
- To find the most relevant information (for example, related courses), vector search is used, which is a query to the database based on the cosine distance between vectors (embeddings).The Concept of Vectors:
- Vectors store the semantic meaning of a text in multidimensional coordinates, where the proximity between vectors (calculated by distance) indicates the similarity of the concept.Tools for Vectors:
- Extensions like PG Vector for PostgreSQL are becoming popular, allowing experimentation with vector searches without immediate need for specialized infrastructure.5. Challenges and Final Perspective
The non-deterministic nature of LLM models (same
input does not guarantee the same output) forces a change in testing strategies that were used until now.Green Ecosystem:
The ecosystem of tools for evaluation and observability of models in production (such as evaluation frameworks, local model servers like Olama, or Behavior-Driven Development solutions) is in its early stages and “quite green.”Loss of Control:
It is normal for developers to feel uncertainty and insecurity or a “sense of loss of control” in the face of these advances.In conclusion, it is useless to ignore the situation, and the best action is to seek order and clarity, updating knowledge. And what better way to do it than by following these two titans on their
YouTube channel!!