AI agents

AI Agents steps allow you to integrate AI agents directly into your Windmill flows. They provide an interface to connect with various AI providers and models, allowing you to process data, generate content, execute actions (Windmill scripts), and make decisions as part of your automated workflows.

Configuration

Provider selection

Choose from supported AI providers including OpenAI, Azure OpenAI, Anthropic, Mistral, DeepSeek, Google AI (Gemini), Groq, OpenRouter, Together AI, or Custom AI endpoints.

Resource configuration

Select or create an AI resource that contains your API credentials and endpoint configuration. Resources allow you to securely store and reuse AI provider credentials across multiple flows.

Model selection

Choose the specific model you want to use from your selected provider. Available models depend on your chosen provider and resource configuration.

Script tools

AI Agents can be equipped with script tools that extend their capabilities beyond text and image generation. Tools are Windmill scripts that the AI can call to perform specific actions or retrieve information. You can add tools from three sources:

Inline scripts - Write custom tools directly within the flow
Workspace scripts - Use existing scripts from your Windmill workspace
Hub scripts - Leverage pre-built tools from the Windmill Hub

Each script tool must have a unique name within the AI agent step and contain only letters, numbers, and underscores. It should be descriptive of the tool's function to help the AI understand when to use them.

When script tools are configured, the AI agent can decide when and how to use them based on the user's request. It selects the most appropriate tool by name, and issues a tool call with JSON arguments that conform to the tool's input schema. Windmill executes the underlying script and returns a JSON result, which is surfaced back to the model as a tool response message and is included in messages.

Websearch tool

AI Agents can be equipped with a built-in websearch tool to retrieve up-to-date information from the web during execution. Websearch is supported on OpenAI, Anthropic, and Google AI (Gemini) providers, and uses each provider's native web search capability (OpenAI's web search tool via the Responses API, Anthropic's web search tool, and Gemini's Google Search grounding). Add it from the tool picker of the AI Agent step — no additional configuration is required.

Nested AI agents (agent as tool)

An AI Agent can be used as a tool inside another AI Agent step, enabling hierarchical or specialized sub-agents (for example, a coordinator agent that delegates to a research agent). When adding a tool to an AI Agent step, pick "AI Agent" as the tool type and configure the nested agent with its own provider, model, system prompt, tools, memory, and other input transforms — the full input transforms view is available, so the nested agent can be configured just like a regular AI Agent step.

Only two levels of nesting are supported: flow → AI agent → nested AI agent tool. A nested AI agent cannot itself contain further AI agent tools; attempting deeper nesting is rejected at runtime.

MCP tools

AI Agents can connect to MCP (Model Context Protocol) servers as tools, enabling access to any tools exposed by MCP-compatible servers.

If the MCP server handles OAuth, you can simply connect using the OAuth URL. Otherwise, you can use MCP tools by following these steps:

Create an MCP resource in Windmill with:
- Name: Identifier for the MCP resource
- URL: The MCP server endpoint URL
- Auth token (optional): Authentication token for the MCP server
- Headers (optional): Additional HTTP headers for the connection
Add the MCP resource to your AI Agent step as a tool
The AI agent will automatically discover and use tools exposed by the MCP server

Note: Only HTTP streamable MCP servers are supported.

Input parameters

Required parameters

user_message (string)

The main input message or prompt that will be sent to the AI model. This can include static text, dynamic content from previous flow steps, or templated strings with variable substitution.

system_prompt (string)

The system prompt that defines the AI's role, behavior, and context. This helps guide the model's responses and ensures consistent behavior across interactions.

Optional parameters

output_type (text | image)

Specifies the type of output the AI should generate:

text - Generate text responses (default).
image - Generate image outputs (supported by OpenAI, Google AI (Gemini), and OpenRouter). Requires an S3 object storage to be configured at the workspace level.

output_schema (json-schema)

Define a JSON schema that the AI agent will follow for its response format. This ensures structured, predictable outputs that can be easily processed by subsequent flow steps.

memory (auto | manual)

Manages the conversation memory for the AI agent:

auto: Windmill automatically handles the memory, maintaining up to the specified number of last messages
manual: User provides the message history directly in the required format

Message format

When using manual memory mode, each message must follow the OpenAI message format:

{
  "role": "string",
  "content": "string | null",
  "tool_calls": [/* array of tool calls */], // optional
  "tool_call_id": "string | null" // optional
}

Using memory with webhooks

When using memory via webhook with auto mode, you must include a memory_id query parameter in your request. The memory_id must be a 32-character UUID that uniquely identifies the conversation context. This allows the AI agent to maintain message history across webhook calls.

Example webhook URL:

https://your-windmill-instance/api/w/your-workspace/jobs/run_wait_result/f/your-flow?memory_id=550e8400e29b41d4a716446655440000

streaming (optional)

Whether to stream the progress of the AI agent. The stream will contain json payloads separated by newlines. The payloads can be of the following types:

{
	"type": "token_delta", // sent everytime the AI generates a new token
	"content": "string",
}

{
	"type": "tool_call", // sent when the tool call is started
	"call_id": "string",
    "function_name": "string",
    "function_arguments": "string",
}

{
	"type": "tool_call_arguments", // sent all arguments have been received
	"call_id": "string",
    "function_name": "string",
    "function_arguments": "string",
}

{
	"type": "tool_execution", // sent when the tool job is started
	"call_id": "string",
    "function_name": "string",
}

{
	"type": "tool_result", // sent when the tool job is completed
	"call_id": "string",
    "function_name": "string",
    "result": "string",
    "success": true, // whether the tool job completed successfully
}

The final result of the step will contain an additional field wm_stream with the complete stream.

You can use the SSE stream webhooks or an HTTP route in Sync SSE mode to get the stream. If the AI agent step is not the last step of your flow, make sure to configure early return with the step's id to have the stream returned. The new_result_stream field of an SSE event can contain multiple payloads at a time, make sure to split on line breaks. Only complete payloads are streamed, you do not need to handle partial JSON.

user_attachments (optional)

Allows you to pass images or PDF documents as input to the AI model for analysis, processing, or context. The AI can analyze the attachment content and respond accordingly. Requires an S3 object storage to be configured at the workspace level.

Supported attachment types:

Images: analyzed as visual content across all providers that support vision input.
PDF documents: supported on Anthropic (sent as document content blocks), OpenAI (sent as input_file blocks), and Google AI (Gemini, natively supported via inline data). The MIME type is detected automatically from the file extension.

The field was previously named user_images; flows using the old name continue to work through backward-compatible deserialization.

max_iterations (number, optional)

Maximum number of tool-calling iterations the agent is allowed to perform in a single run. Each iteration corresponds to one round-trip where the model can choose to call tools; once the limit is reached, the agent stops and returns its current output. Use this to bound cost and runtime, or to prevent runaway agents that get stuck in tool-calling loops.

max_completion_tokens (number)

Controls the maximum number of tokens the AI can generate in its response. This helps manage costs and ensures responses stay within desired length limits.

temperature (number)

Controls randomness in text generation:

0.0 - Deterministic, focused responses
2.0 - Maximum creativity and randomness
Default values typically range from 0.1 to 1.0

Chat mode

Flows with AI agents can be run in chat mode, which transforms the standard flow interface into a conversational chat UI. This mode is particularly useful for creating interactive AI applications where users can have ongoing conversations with the AI agent.

Enabling chat mode

To enable chat mode for a flow:

Navigate to the flow inputs configuration
Toggle the "Chat mode" option on the top right
When the flow is run, it will display as a chat interface instead of the standard form interface

Features

Conversational interface: The flow runs in a chat-like UI where users can send messages and receive responses in a familiar messaging format
Multiple conversations: Users can maintain multiple different conversation threads within the same flow
Conversation history: Each conversation maintains its own history, allowing users to scroll back through previous messages
Persistent context: When using the memory parameter, the AI agent can remember and reference previous messages in the conversation

Recommended configuration

For optimal chat mode experience, we recommend placing an AI agent step at the end of your flow with both streaming and memory (set to auto mode) enabled. This configuration:

Enables real-time response streaming for a more interactive chat experience
Maintains conversation context across multiple messages

Use cases

Chat mode is ideal for:

Developing conversational workflows
Implementing AI assistants with memory
Building chatbots that can execute actions through tools

Output

The AI Agent step returns an object with two keys:

output

Contains the content of the final response from the AI agent:

Text output:
- When no output schema is specified: Returns the last message content, which can be a string or an array containing strings.
- When an output schema is specified: Returns the structured output conforming to the defined JSON schema.
Image output:
- Returns the S3 object of the image

This is typically what you'll use in subsequent flow steps when you need the AI's final answer or result.

messages

Only in text output mode, contains the complete conversation history, including:

User input messages
Assistant intermediate outputs
Tool calls made by the AI
Tool execution results

The messages array provides full visibility into the AI's reasoning process and tool usage, which can be useful for debugging, logging, or understanding how the AI reached its conclusion.

usage (optional)

Contains token usage information for the AI agent run, accumulated across all iterations (including tool calls). Fields include input_tokens, output_tokens, total_tokens, and — for providers that support prompt caching — cache_read_input_tokens and cache_write_input_tokens.

{
  "usage": {
    "input_tokens": 1234,
    "output_tokens": 567,
    "total_tokens": 1801
  }
}

Usage tracking is supported on Anthropic, AWS Bedrock, Google AI (Gemini), OpenAI (Responses API), Azure OpenAI / Chat Completions, and OpenRouter. If a provider does not support the stream_options.include_usage parameter, Windmill automatically retries the request without it.

wm_stream (optional)

Only included in streaming mode, contains the complete stream.

Anthropic prompt caching

For Anthropic models (including Vertex AI Anthropic), Windmill automatically applies cache_control: { "type": "ephemeral" } to the system prompt, the last custom tool definition, and the last content block of the last message on every request. This reduces token costs and latency on multi-turn conversations and iterative tool use, since Anthropic can reuse the cached prefix across calls. Cache hits are reflected in the cache_read_input_tokens and cache_write_input_tokens fields of the usage output. No configuration is required — caching is applied transparently whenever an Anthropic-based provider is used.

Debugging

AI Agent Debugging

Flow-level visualization

Tool calls are displayed directly on the flow graph as separate nodes connected to the AI Agent step. You can click on these tool call nodes to view detailed execution information, including inputs, outputs, and execution logs.

Detailed logging

In the logging panel of the AI Agent step, you can see the comprehensive logging:

All input parameters passed to the AI agent
Tool calls made by the AI, including which tools were selected and their inputs
Individual tool execution results with full job details
The final AI response and complete message history

This detailed view allows you to trace through the AI's decision-making process and verify that tools are being called correctly with the expected inputs and outputs.

Configuration​

Provider selection​

Resource configuration​

Model selection​

Script tools​

Websearch tool​

Nested AI agents (agent as tool)​

MCP tools​

Input parameters​

Required parameters​

user_message (string)​

system_prompt (string)​

Optional parameters​

output_type (text | image)​

output_schema (json-schema)​

memory (auto | manual)​

Message format​

Using memory with webhooks​

streaming (optional)​

user_attachments (optional)​

max_iterations (number, optional)​

max_completion_tokens (number)​

temperature (number)​

Chat mode​

Enabling chat mode​

Features​

Recommended configuration​

Use cases​

Output​

output​

messages​

usage (optional)​

wm_stream (optional)​

Anthropic prompt caching​

Debugging​

Flow-level visualization​

Detailed logging​