Agents Reference

This page provides detailed documentation for every agent in YAAAF. Each agent is described with its role in the artifact flow, input/output types, configuration, and usage examples.

EXTRACTORS

Extractors are source agents that pull data from external systems into the artifact flow.

SqlAgent

Role: EXTRACTOR

Purpose: Executes SQL queries against configured databases and returns tabular data.

Accepts: None (source agent)

Produces: table

Description: SqlAgent converts natural language questions into SQL queries, executes them against configured SQLite databases, and returns the results as table artifacts. It understands database schemas and can handle complex queries involving joins, aggregations, and filters.

Configuration:

from yaaaf.components.sources.sqlite_source import SqliteSource
from yaaaf.components.agents.sql_agent import SqlAgent

source = SqliteSource(name="sales_db", db_path="data/sales.db")
sql_agent = SqlAgent(client=client, sources=[source])

Example workflow usage:

assets:
  sales_data:
    agent: SqlAgent
    description: "Get monthly sales totals"
    type: table

Capabilities:

Schema introspection (automatically discovers tables and columns)
Natural language to SQL conversion
Query validation and error handling
Support for multiple database sources
Parameterized queries for safety

DocumentRetrieverAgent

Role: EXTRACTOR

Purpose: Retrieves relevant text chunks from document collections using BM25 search.

Accepts: None (source agent)

Produces: text

Description: DocumentRetrieverAgent searches through configured document collections to find passages relevant to the query. It uses BM25 (Best Match 25) ranking to identify the most relevant chunks from text files, PDFs, and other document formats.

Configuration:

from yaaaf.components.sources.rag_source import RAGSource
from yaaaf.components.agents.document_retriever_agent import DocumentRetrieverAgent

source = RAGSource(description="Technical manuals", source_path="docs/")
# Add text files
source.add_text(open("manual.txt").read())
# Add PDFs (with optional page chunking)
with open("guide.pdf", "rb") as f:
    source.add_pdf(f.read(), "guide.pdf", pages_per_chunk=1)

agent = DocumentRetrieverAgent(client=client, sources=[source])

Example workflow usage:

assets:
  relevant_docs:
    agent: DocumentRetrieverAgent
    description: "Find documentation about installation"
    type: text

Supported formats:

Plain text (.txt)
Markdown (.md)
HTML (.html, .htm)
PDF (.pdf) with configurable page chunking

BraveSearchAgent

Role: EXTRACTOR

Purpose: Searches the web using Brave Search API.

Accepts: None (source agent)

Produces: table

Description: BraveSearchAgent queries the Brave Search API to find relevant web pages. It returns structured results including titles, URLs, and snippets. Brave Search uses its own independent search index, providing results that may differ from Google or Bing.

Configuration:

Requires BRAVE_SEARCH_API_KEY environment variable or configuration.

{
  "api_keys": {
    "brave_search_api_key": "YOUR_API_KEY"
  }
}

Example workflow usage:

assets:
  search_results:
    agent: BraveSearchAgent
    description: "Search for recent AI developments"
    type: table

DuckDuckGoSearchAgent

Role: EXTRACTOR

Purpose: Searches the web using DuckDuckGo.

Accepts: None (source agent)

Produces: table

Description: DuckDuckGoSearchAgent performs web searches using DuckDuckGo’s search API. It does not require API keys and provides privacy-focused search results. Results are returned as a table with titles, URLs, and snippets.

Example workflow usage:

assets:
  web_results:
    agent: DuckDuckGoSearchAgent
    description: "Find information about climate change"
    type: table

Note: No API key required. Rate limits may apply.

UrlAgent

Role: EXTRACTOR

Purpose: Fetches and extracts content from specific URLs.

Accepts: None (source agent)

Produces: text

Description: UrlAgent retrieves content from web pages given their URLs. It extracts the main text content, handling HTML parsing and content extraction. Useful when you need content from specific known URLs rather than search results.

Example workflow usage:

assets:
  page_content:
    agent: UrlAgent
    description: "Fetch content from the documentation page"
    type: text
    params:
      url: "https://example.com/docs"

UserInputAgent

Role: EXTRACTOR

Purpose: Collects information from users interactively.

Accepts: None (source agent)

Produces: text

Description: UserInputAgent pauses workflow execution to request input from the user. This enables interactive workflows where user decisions or additional information is needed mid-execution. The workflow resumes once the user provides input.

Interaction Mode: INTERACTIVE (pauses for user input)

Example workflow usage:

assets:
  user_preference:
    agent: UserInputAgent
    description: "Ask user which format they prefer"
    type: text

TRANSFORMERS

Transformers convert artifacts from one form to another.

MleAgent

Role: TRANSFORMER

Purpose: Trains machine learning models on tabular data.

Accepts: table

Produces: model

Description: MleAgent analyzes tabular data and trains scikit-learn models. It can perform classification, regression, and clustering tasks. The agent automatically selects appropriate algorithms based on the data characteristics and creates model artifacts that can be used for predictions.

Output Permanence: PERSISTENT (models are saved)

Example workflow usage:

assets:
  training_data:
    agent: SqlAgent
    description: "Get customer data with churn labels"
    type: table

  churn_model:
    agent: MleAgent
    description: "Train model to predict customer churn"
    type: model
    inputs: [training_data]

Capabilities:

Automatic feature selection
Model selection based on task type
Cross-validation for model evaluation
Feature importance analysis

ReviewerAgent

Role: TRANSFORMER

Purpose: Analyzes and validates artifacts.

Accepts: table, text

Produces: table

Description: ReviewerAgent examines artifacts and provides analysis, validation, or summary. It can identify patterns, check data quality, extract key information, and generate structured reports about the input artifacts.

Example workflow usage:

assets:
  raw_data:
    agent: SqlAgent
    description: "Get raw sales data"
    type: table

  data_analysis:
    agent: ReviewerAgent
    description: "Analyze data quality and identify issues"
    type: table
    inputs: [raw_data]

ToolAgent

Role: TRANSFORMER

Purpose: Executes external tools via Model Context Protocol (MCP).

Accepts: table, text

Produces: table

Description: ToolAgent interfaces with external tools and services through the MCP protocol. It can call functions provided by MCP servers, enabling integration with calculators, APIs, file systems, and other external capabilities.

Configuration:

{
  "tools": [
    {
      "name": "calculator",
      "type": "sse",
      "url": "http://localhost:8080/sse"
    },
    {
      "name": "file_tools",
      "type": "stdio",
      "command": "python",
      "args": ["-m", "mcp_server"]
    }
  ]
}

Example workflow usage:

assets:
  calculation_result:
    agent: ToolAgent
    description: "Calculate compound interest"
    type: table

NumericalSequencesAgent

Role: TRANSFORMER

Purpose: Structures unformatted numerical data into tables.

Accepts: text

Produces: table

Description: NumericalSequencesAgent parses unstructured text containing numerical data and converts it into structured tabular format. It identifies patterns, extracts numbers, and organizes them into meaningful columns.

Example workflow usage:

assets:
  raw_text:
    agent: DocumentRetrieverAgent
    description: "Get financial report text"
    type: text

  structured_data:
    agent: NumericalSequencesAgent
    description: "Extract numerical data into table"
    type: table
    inputs: [raw_text]

ValidationAgent

Role: TRANSFORMER

Purpose: Validates artifacts against user goals and triggers replanning when needed.

Accepts: table, text, image, model

Produces: (validation result - used internally)

Description: ValidationAgent inspects each artifact produced during workflow execution and validates it against both the user’s original goal and the step description. It returns a confidence score from 0.0 to 1.0. Based on the confidence level, the system will:

0.5 - 1.0: Continue execution (artifact is acceptable)
0.3 - 0.5: Trigger automatic replanning (artifact has issues)
0.0 - 0.3: Ask user for guidance (too uncertain to auto-fix)

Note: ValidationAgent is used internally by the workflow engine. It is not called directly in workflows.

Validation criteria:

Does the artifact help achieve the user’s original goal?
Does it match what the step description promised to produce?
Is the data reasonable, complete, and useful?
Are there any obvious errors or problems?

Artifact inspection limits:

Tables: Schema + first 20 rows are inspected
Text: First ~1000 tokens are inspected
Images: Metadata and generation code are inspected
Models: Model type and parameters are inspected

Replanning behavior:

When validation fails with confidence 0.3-0.5, the system:

Keeps all successfully validated artifacts
Generates a new plan that works around the failed step
Uses the suggested fix from the validation agent
Retries up to 3 times before giving up

SYNTHESIZERS

Synthesizers combine multiple artifacts into unified outputs.

AnswererAgent

Role: SYNTHESIZER

Purpose: Combines multiple artifacts into comprehensive answers.

Accepts: table, text

Produces: table

Description: AnswererAgent is the primary synthesis agent. It takes artifacts from multiple sources (documents, databases, web searches) and generates comprehensive, well-cited answers. Output is a structured table with paragraphs and their sources.

Example workflow usage:

assets:
  doc_results:
    agent: DocumentRetrieverAgent
    description: "Get relevant documentation"
    type: text

  db_results:
    agent: SqlAgent
    description: "Get supporting data"
    type: table

  comprehensive_answer:
    agent: AnswererAgent
    description: "Synthesize findings into complete answer"
    type: table
    inputs: [doc_results, db_results]

Output format:

| paragraph | source |
|-----------|--------|
| Finding from analysis... | Database: sales_2023 |
| Additional context... | Document: manual.pdf |

UrlReviewerAgent

Role: SYNTHESIZER

Purpose: Aggregates and summarizes content from multiple URLs.

Accepts: table (with URLs)

Produces: table

Description: UrlReviewerAgent takes search results containing URLs, fetches the content from each URL, and synthesizes the information into a unified summary. It is typically used after a search agent to process the found pages.

Example workflow usage:

assets:
  search_results:
    agent: BraveSearchAgent
    description: "Search for product reviews"
    type: table

  review_summary:
    agent: UrlReviewerAgent
    description: "Summarize content from search results"
    type: table
    inputs: [search_results]

PlannerAgent

Role: SYNTHESIZER

Purpose: Creates execution workflows from natural language goals.

Accepts: text (goals)

Produces: text (YAML workflow)

Description: PlannerAgent analyzes user goals and generates YAML workflow definitions. It uses RAG-based example retrieval from 50,000+ planning scenarios to produce high-quality workflows. The planner understands agent capabilities and artifact type compatibility.

Note: PlannerAgent is used internally by the orchestrator. It is not typically called directly in workflows.

Capabilities:

Goal extraction and analysis
Agent capability matching
Artifact type compatibility checking
DAG construction with proper dependencies
RAG-based example retrieval for better plans

GENERATORS

Generators create final outputs or side effects.

VisualizationAgent

Role: GENERATOR

Purpose: Creates charts and visualizations from data.

Accepts: table

Produces: image

Description: VisualizationAgent generates matplotlib-based visualizations from tabular data. It can create bar charts, line graphs, scatter plots, pie charts, and other visualization types. Output is saved as PNG images.

Output Permanence: PERSISTENT (images are saved)

Example workflow usage:

assets:
  sales_data:
    agent: SqlAgent
    description: "Get quarterly sales"
    type: table

  sales_chart:
    agent: VisualizationAgent
    description: "Create bar chart of sales by quarter"
    type: image
    inputs: [sales_data]

Supported chart types:

Bar charts (vertical and horizontal)
Line graphs
Scatter plots
Pie charts
Histograms
Box plots

BashAgent

Role: GENERATOR

Purpose: Performs filesystem operations.

Accepts: text

Produces: text

Description: BashAgent executes filesystem operations like reading files, listing directories, and writing output. It operates in a sandboxed environment and may request user confirmation for sensitive operations.

Interaction Mode: INTERACTIVE (may request confirmation)

Output Permanence: PERSISTENT (files are created/modified)

Example workflow usage:

assets:
  report_data:
    agent: AnswererAgent
    description: "Generate report content"
    type: table

  saved_report:
    agent: BashAgent
    description: "Save report to file"
    type: text
    inputs: [report_data]

Security: BashAgent operates with restricted permissions and may prompt for user confirmation before executing operations.

Agent Configuration

Per-Agent Model Settings

Each agent can use different model settings:

{
  "agents": [
    "sql",
    {
      "name": "visualization",
      "model": "qwen2.5-coder:32b",
      "temperature": 0.1
    },
    {
      "name": "answerer",
      "model": "qwen2.5:32b",
      "temperature": 0.7,
      "max_tokens": 4096
    }
  ]
}

Agent Budgets

Agents have execution budgets limiting their LLM calls:

Default budget: 2 calls per query
PlannerAgent: 1 call (planning should be decisive)
Complex agents: May have higher budgets for multi-step reasoning

Creating Custom Agents

To create a new agent:

Choose a base class:
- ToolBasedAgent: For agents using the executor pattern
- CustomAgent: For agents with custom logic

Define taxonomy:

# In agent_taxonomies.py
"MyAgent": AgentTaxonomy(
    data_flow=DataFlow.TRANSFORMER,
    interaction_mode=InteractionMode.AUTONOMOUS,
    output_permanence=OutputPermanence.EPHEMERAL,
    description="Transforms X into Y"
)

Implement the agent:

class MyAgent(ToolBasedAgent):
    def __init__(self, client):
        super().__init__(client, MyExecutor())
        self._system_prompt = my_prompt_template

    @staticmethod
    def get_info() -> str:
        return "Transforms X into Y"

Register in orchestrator builder:

# In orchestrator_builder.py
self._agents_map["my_agent"] = MyAgent

Add examples to planner dataset:

The planner uses RAG-based example retrieval to generate workflows. For the planner to know how to use your custom agent, you must add examples to the planner dataset.

Edit yaaaf/data/planner_dataset.csv and add rows with:
- scenario: A natural language description of when to use your agent
- workflow_yaml: A YAML workflow showing your agent in action
- agents_used: List including your agent name
- num_agents: Number of agents in the workflow
- num_steps: Number of steps
- complexity: Workflow complexity (simple_chain, parallel, etc.)
- is_valid: Set to True
- error_message: Leave empty
Example entry:
```
scenario,workflow_yaml,agents_used,num_agents,num_steps,complexity,is_valid,error_message
"Transform the raw sensor data into a normalized format for analysis","assets:
  raw_data:
    agent: SqlAgent
    description: ""Get raw sensor readings""
    type: table
  normalized_data:
    agent: MyAgent
    description: ""Normalize sensor data""
    type: table
    inputs: [raw_data]","['SqlAgent', 'MyAgent']",2,2,simple_chain,True,
```
Add 5-10 diverse examples showing your agent in different workflow contexts. This ensures the planner can correctly incorporate your agent into generated workflows.

Important: Without examples in the dataset, the planner will not know when or how to use your custom agent in workflows.