Skip to content

Blog

Building an autonomous ML researcher with Claude Code dynamic workflows

As an experiment, I re-implemented the autonomous ML research-and-engineering workflow encoded in Hugging Face's ml-intern as a Claude Code dynamic workflow that delegates execution to the Hugging Face skills (hf-skills) instead of ml-intern's custom tools1. I did it in three steps: extract a technology-neutral specification of the workflow, compile that specification into a single generic workflow script, then run the script against a concrete task. The result is one workflow that accepts any ML research task as an argument, rather than having Claude Code write a new workflow script for each task.

Agentic editing of terminal screencasts

asciinema is naturally suited to agentic screencast editing. A .cast recording is plain text (JSON Lines), one event per line of the form [interval, code, data], where interval is seconds since the previous event. Editing reduces to arithmetic on those intervals (and optionally to substitution on the payloads, e.g. for redaction), so a small tool can expose trimming, speeding, and cutting as cheap operations that a language model can reason about and combine.

As a demonstration, I recorded an ~85-minute Claude Code session running an ML fine-tuning task with the ml-research plugin and turned it into a 40-second GIF of the highlights without leaving Claude Code. The edit was driven by short natural-language instructions and one custom skill (cast-edit) that wraps the format with a small Python tool.

Code Actions as Tools: Evolving Tool Libraries for Agents

Programmatic tool calling is gaining traction in agent development. Instead of emitting one JSON tool call at a time, agents generate executable "code actions" that call tools in a sandboxed environment. This pattern is inspired by Apple's CodeAct and appears in many agentic systems. More recent implementations increasingly focus on programmatic calling of MCP tools.

These solutions typically generate Python or TypeScript APIs for MCP tools, let an agent write code actions that call these APIs, execute the code in a sandboxed environment, and feed results back to the agent. This improves performance compared to JSON-based approaches, but it often misses an important point: a generated code action can itself become a tool, available for reuse in later code actions.

From Single-User to Multi-Party Conversational AI

Single-user AI agents excel at responding to direct queries in one-on-one interactions. A user sends the agent a self-contained query with sufficient context, and the agent processes it directly. Even in group chats, the typical pattern remains the same: users mention the agent with a direct query. This interaction model treats multi-user environments as collections of individual exchanges rather than true multi-party conversations.

Multi-party conversational AI systems, on the other hand, must derive agent queries from more complex exchanges between multiple participants. This requires detecting meaningful patterns while knowing when to stay silent. For example, when a conversation stalls on a decision, the system detects that and suggests resolutions based on available agent capabilities. Single-user agents respond to every input, but multi-party AI must engage only when specific patterns emerge.

Agent Authorization Without the Pain

Your agent needs to read from your Google Calendar and send emails through Gmail. This seemingly simple requirement quickly becomes complex when you realize you need OAuth flows, token refresh logic, and secure credential storage. Multiply that by every API your agent needs.

You shouldn't have to build this infrastructure yourself. Connect your agents to 250+ APIs and 3000+ tools with Model Context Protocol (MCP) and Composio. Composio handles authorization, remote MCP servers and tool execution, while your application focuses on agentic reasoning and tool usage.