Building a Design Agent with Nano Banana and Google ADK

The creative community has been buzzing about Google's new Nano Banana model since its release. The combination of lightning-fast generation speeds and surprisingly high-quality outputs has sparked a wave of innovation, with creators sharing everything from photorealistic portraits to abstract artistic compositions across social platforms. Personally, I was struck by how the model managed to deliver both speed and quality — a trade-off that historically required choosing one or the other.

This got me thinking: while direct API calls to Nano Banana produce impressive results, what if we could amplify its capabilities through intelligent orchestration? What if we could build AI agents that don't just generate images on demand, but actually understand context, iterate on feedback, and produce polished marketing content autonomously?

This brings us to an exciting intersection: Google's Agent Development Kit (ADK) with Nano Banana's powerful image generation capabilities to create truly intelligent design agents. In this post, I'll walk you through building a design agent that demonstrates how thoughtful agent architecture can transform raw AI capabilities into professional-grade creative workflows.

Why build an Agent instead of direct API calls?

Before diving into the architecture, it's worth asking: why not just call Nano Banana directly? The answer lies in the gap between raw AI capabilities and real-world creative workflows.

Direct API calls are powerful but limited:

Users need to craft effective prompts (most people aren't prompt engineers)
No memory or context between requests
No quality control or iteration logic
No integration with existing creative workflows
No customization for brand guidelines or style consistency

An agent-based approach unlocks professional workflows:

Intelligent Prompt Engineering: The design agent has in-built prompt rewriting logic that transforms user requests into optimized prompts for Nano Banana. Users can simply say "create a holiday promotion" and the agent handles the technical prompt crafting.

Adaptive Requirements Gathering: The design agent can dynamically ask relevant questions to ensure that it has all the relevant information needed before starting the generation of images, instead of relying on the human user to remember to input all the relevant details.

Custom Functionality: Features like a "Deep Think Mode", where the system intelligently iterates through several iterations, each improving on the previous (as would be the case in a design agency), wouldn't be possible with direct API calls. The agent can implement complex, multi-step workflows that automatically iterate and improve content quality.

Brand Compliance: Agents can incorporate specific controls to check for conformance to branding guidelines, ensuring consistent visual identity across all generated content.

Contextual & Long Term Memory: Unlike stateless API calls, agents maintain session state, allowing for iterative refinement and building upon previous work. The agent can also be configured to have long term memory which can be useful for long term interactions between the user and agent.

Quality Assurance: Built-in review mechanisms ensure outputs meet professional standards before presenting them to users.

Workflow Integration: Agents can seamlessly integrate with existing creative tools, asset management systems, and approval processes.

The result? A system that feels like working with a skilled creative professional rather than a powerful but raw AI tool.

What Makes a Great Design Agent?

Building an effective design agent isn't just about connecting an AI model to image generation APIs. It's about creating a system that understands context, maintains state, and can iterate based on feedback — much like working with a skilled human designer.

The key characteristics of an effective design agent include:

Context Awareness: Understanding user preferences, brand guidelines, and project history across sessions.

Iterative Refinement: The ability to receive feedback and improve upon previous work, not just generate from scratch.

Tool Integration: Seamless access to various creative tools and asset management systems.

Quality Control: Built-in review mechanisms to ensure outputs meet professional standards.

Stateful Memory: Maintaining knowledge of past creations, user preferences, and ongoing projects.

Introducing the ADK-Powered Design Agent

This design agent, built with Google's ADK framework, embodies these principles through a multi-agent architecture. At its core, it's designed to handle two distinct modes of operation:

Regular Mode: Interactive Design Assistant

In regular mode, the agent functions as an intelligent design assistant. Users can request for the type of content they want, specify parameters like aspect ratio and text overlays, and receive professionally crafted content. The agent maintains conversation context and allows for iterative improvements based on user feedback.

Deep Think Mode: Autonomous Quality Assurance

We can take the agent's capabilities a step further with a "Deep Think Mode" — an autonomous workflow that mirrors how professional creative teams operate. When a user requests the agent to "deep think," the agent enters a sophisticated loop of content generation, review, and refinement.

The Architecture: A Symphony of Specialized Agents

The beauty of ADK lies in its support for composable multi-agent systems. Our design agent is actually a orchestration of several specialized sub-agents, each with distinct responsibilities:

root_agent = LlmAgent(
    name="social_media_agent",
    model="gemini-2.5-flash",
    instruction="""You are a design agent. Your goal is to help users create and iterate on creative designs for the user including but not limited to posters, social media posts, ads, infographics etc.**Deep Think Mode**: If the user says they want you to "deep think" or use any instructions along those lines, then call the deep_think_loop to perform a deeper generation process.
    **Regular Mode**: For normal requests, use the `generate_image` tool to create the first version of the image.""",
    tools=[generate_image, edit_image, list_asset_versions, list_reference_images, load_artifacts_tool],
    sub_agents=[deep_think_loop],
    before_model_callback=process_reference_images_callback
)

root_agent = LlmAgent(
    name="social_media_agent",
    model="gemini-2.5-flash",
    instruction="""You are a design agent. Your goal is to help users create and iterate on creative designs for the user including but not limited to posters, social media posts, ads, infographics etc.**Deep Think Mode**: If the user says they want you to "deep think" or use any instructions along those lines, then call the deep_think_loop to perform a deeper generation process.
    **Regular Mode**: For normal requests, use the `generate_image` tool to create the first version of the image.""",
    tools=[generate_image, edit_image, list_asset_versions, list_reference_images, load_artifacts_tool],
    sub_agents=[deep_think_loop],
    before_model_callback=process_reference_images_callback
)

This primary agent serves as the interface layer, handling user interactions and determining whether to route requests to the deep think loop or handle them directly through sub-agent delegation.

2. The Content Generation Agent

content_generation_agent = LlmAgent(
    name="ContentGenAgent",
    instruction="""
    You are a helpful and creative design assistant who helps to create images based on the user's requirements.

    if deep_think_iteration: {deep_think_iteration} is 1, call the generate_image tool, else call the edit_image tool
    Use the below feedback (if any) given by the review agent when you draft your inputs for the edit_image tool to ensure that the content is corrected to meet the user's requirements.

    **Important**:
    1. when calling the generate_image or edit_image tools, be very clear and succinct with your instructions
    2. avoid vague instructions. for example "improve contrast" and "reduce font size" are vague. instead be explicit "add a black gradient background to the top of the image behind the text to increase contrast"
    3. use your creativity to figure out how the user requirements and improvement suggestions can be implemented
    Feedback from previous iterations: {content_review}
    """,
    tools=[generate_image, edit_image, load_artifacts_tool]
)

content_generation_agent = LlmAgent(
    name="ContentGenAgent",
    instruction="""
    You are a helpful and creative design assistant who helps to create images based on the user's requirements.

    if deep_think_iteration: {deep_think_iteration} is 1, call the generate_image tool, else call the edit_image tool
    Use the below feedback (if any) given by the review agent when you draft your inputs for the edit_image tool to ensure that the content is corrected to meet the user's requirements.

    **Important**:
    1. when calling the generate_image or edit_image tools, be very clear and succinct with your instructions
    2. avoid vague instructions. for example "improve contrast" and "reduce font size" are vague. instead be explicit "add a black gradient background to the top of the image behind the text to increase contrast"
    3. use your creativity to figure out how the user requirements and improvement suggestions can be implemented
    Feedback from previous iterations: {content_review}
    """,
    tools=[generate_image, edit_image, load_artifacts_tool]
)

This agent handles the actual content creation, intelligently choosing between generating new content or editing existing assets based on the iteration context and providing detailed, actionable instructions.

3. The Content Review Agent

Perhaps the most innovative component is the review agent, which functions as an AI quality assurance specialist:

content_review_agent = LlmAgent(
    name="ContentReviewAgent",
    model="gemini-2.5-flash",
    instruction="""You are a marketing content reviewer. Your job is to evaluate generated marketing content and provide constructive feedback.Load the generated image named {last_generated_image} using load_artifacts_tool and evaluate it against the original user request and provide feedback on:
        1. **Adherence to Request**: Does the content match what the user originally asked for?
        2. **Visual Appeal**: Is the composition, colors, and overall design appealing and professional?
        3. **Obvious Issues**: Are there any clear problems like poor text readability, distorted elements, or technical issues?
        4. **Previous Feedback**: If this is a revision, has the previous feedback been properly addressed?
        5. **Typos**: Are there any misspelt words on the image?
        Original user request: {original_prompt}
        Current iteration: {iteration_count}""",
    output_schema=ContentReview,
    output_key="content_review",
    tools=[load_artifacts_tool]
)

content_review_agent = LlmAgent(
    name="ContentReviewAgent",
    model="gemini-2.5-flash",
    instruction="""You are a marketing content reviewer. Your job is to evaluate generated marketing content and provide constructive feedback.Load the generated image named {last_generated_image} using load_artifacts_tool and evaluate it against the original user request and provide feedback on:
        1. **Adherence to Request**: Does the content match what the user originally asked for?
        2. **Visual Appeal**: Is the composition, colors, and overall design appealing and professional?
        3. **Obvious Issues**: Are there any clear problems like poor text readability, distorted elements, or technical issues?
        4. **Previous Feedback**: If this is a revision, has the previous feedback been properly addressed?
        5. **Typos**: Are there any misspelt words on the image?
        Original user request: {original_prompt}
        Current iteration: {iteration_count}""",
    output_schema=ContentReview,
    output_key="content_review",
    tools=[load_artifacts_tool]
)

This agent evaluates generated content across multiple dimensions:

Adherence to Request: Does the content match the user's requirements?
Visual Appeal: Is the composition and design professional?
Technical Quality: Are there obvious issues or defects?
Typo Detection: Identifies spelling errors in overlaid text
Feedback Integration: Has previous feedback been properly addressed?

4. The Loop Control Agent

loop_control_agent = LlmAgent(
    name="LoopControlAgent",
    model="gemini-2.5-flash",
    instruction="""You are responsible for determining whether the deep think content creation process should continue or conclude.**Continue Loop If:**
      - The content doesn't match the user's original request
      - There are significant visual appeal issues
      - Obvious problems or technical issues exist
      - Previous feedback hasn't been properly addressed
      - The content could be significantly improved
      **End Loop If:**
      - The content matches the user's request well
      - Visual appeal is good and professional
      - No obvious issues or problems
      - Previous feedback has been addressed
      - Only minor improvements could be made
      - Maximum iterations have been reached
      Current iteration: {iteration_count}
      Max iterations: 4
      Review feedback: {content_review}""",
    output_schema=LoopDecision,
    output_key="loop_decision",
)

loop_control_agent = LlmAgent(
    name="LoopControlAgent",
    model="gemini-2.5-flash",
    instruction="""You are responsible for determining whether the deep think content creation process should continue or conclude.**Continue Loop If:**
      - The content doesn't match the user's original request
      - There are significant visual appeal issues
      - Obvious problems or technical issues exist
      - Previous feedback hasn't been properly addressed
      - The content could be significantly improved
      **End Loop If:**
      - The content matches the user's request well
      - Visual appeal is good and professional
      - No obvious issues or problems
      - Previous feedback has been addressed
      - Only minor improvements could be made
      - Maximum iterations have been reached
      Current iteration: {iteration_count}
      Max iterations: 4
      Review feedback: {content_review}""",
    output_schema=LoopDecision,
    output_key="loop_decision",
)

This agent makes intelligent decisions about when content is "good enough" to present to the user, preventing infinite loops while ensuring quality standards.

Technical Notes on ADK

Stateful Context Management

Unlike traditional AI interactions that treat each request in isolation, our agent maintains rich context through ADK's session state management:

# Store reference images with versioning
callback_context.state["reference_images"][filename] = {
    "version": ref_count,
    "uploaded_version": version
}
# Track asset versions and iteration history
tool_context.state["asset_versions"][asset_name] = version
tool_context.state["deep_think_iteration"] = iteration_count

# Store reference images with versioning
callback_context.state["reference_images"][filename] = {
    "version": ref_count,
    "uploaded_version": version
}
# Track asset versions and iteration history
tool_context.state["asset_versions"][asset_name] = version
tool_context.state["deep_think_iteration"] = iteration_count

This allows the agent to build upon previous work, remember user preferences, and maintain project continuity across sessions.

Advanced Callback System

ADK's callback architecture enables sophisticated pre and post-processing:

async def process_reference_images_callback(
    callback_context: CallbackContext, llm_request: LlmRequest
) -> Optional[Content]:
    """A before_model_callback to process uploaded reference images.

    This function intercepts the request before it goes to the LLM.
    If it finds an image upload, it saves it as a reference artifact.
    """
    if not llm_request.contents:
        return None

    latest_user_message = llm_request.contents[-1]
    image_part = None

    # Look for uploaded images in the latest user message
    for part in latest_user_message.parts:
        if part.inline_data and part.inline_data.mime_type.startswith("image/"):
            logger.info(f"Found reference image to process: {part.inline_data.mime_type}")
            image_part = part
            break

    # Process reference image if found
    if image_part:
        # Generate versioned filename for reference image
        reference_images = callback_context.state.get("reference_images", {})
        ref_count = len(reference_images) + 1
        filename = f"reference_image_v{ref_count}.png"

        # Save the image as an artifact
        version = await callback_context.save_artifact(filename=filename, artifact=image_part)
        callback_context.state["reference_images"][filename] = {
            "version": ref_count,
            "uploaded_version": version
        }
        callback_context.state["latest_reference_image"] = filename

    return None

async def process_reference_images_callback(
    callback_context: CallbackContext, llm_request: LlmRequest
) -> Optional[Content]:
    """A before_model_callback to process uploaded reference images.

    This function intercepts the request before it goes to the LLM.
    If it finds an image upload, it saves it as a reference artifact.
    """
    if not llm_request.contents:
        return None

    latest_user_message = llm_request.contents[-1]
    image_part = None

    # Look for uploaded images in the latest user message
    for part in latest_user_message.parts:
        if part.inline_data and part.inline_data.mime_type.startswith("image/"):
            logger.info(f"Found reference image to process: {part.inline_data.mime_type}")
            image_part = part
            break

    # Process reference image if found
    if image_part:
        # Generate versioned filename for reference image
        reference_images = callback_context.state.get("reference_images", {})
        ref_count = len(reference_images) + 1
        filename = f"reference_image_v{ref_count}.png"

        # Save the image as an artifact
        version = await callback_context.save_artifact(filename=filename, artifact=image_part)
        callback_context.state["reference_images"][filename] = {
            "version": ref_count,
            "uploaded_version": version
        }
        callback_context.state["latest_reference_image"] = filename

    return None

These callbacks handle complex workflows like automatic reference image processing and mode detection transparently.

Dual-Mode Operation: Speed vs Quality Trade-offs

One of the key capabilities that ADK unlocks, is the ability for developers to implement custom behavior, like the implementation of two distinct operational modes in this design agent, giving users control over the speed-quality trade-off:

Simple Mode: Direct and fast interaction where users can call the generate and edit tools directly. Perfect for quick iterations, brainstorming sessions, or when time is critical. Users get immediate results with minimal processing overhead.

Deep Think Mode: An autonomous quality assurance workflow that takes more time but delivers significantly higher quality results. When activated, the agent enters a sophisticated loop of generation, review, and refinement that can take several minutes but produces professional-grade content.

# Deep think mode is detected through the agent's instruction processing
instruction="""**Deep Think Mode**: If the user says they want you to "deep think"
or use any instructions along those lines, then call the deep_think_loop to perform
a deeper generation process.**Regular Mode**: For normal requests, use the `generate_image` tool to create
the first version of the image."""

# Deep think mode is detected through the agent's instruction processing
instruction="""**Deep Think Mode**: If the user says they want you to "deep think"
or use any instructions along those lines, then call the deep_think_loop to perform
a deeper generation process.**Regular Mode**: For normal requests, use the `generate_image` tool to create
the first version of the image."""

This design philosophy recognizes that different creative scenarios have different requirements. For simpler requests a user might prefer the speed of Simple Mode, while more complex requests might warrant the thoroughness of Deep Think Mode.

Intelligent Tool Integration

The agent seamlessly integrates multiple tools through ADK's tool framework:

Image Generation: Creating new content from prompts using Nano Banana
Image Editing: Iterative refinement of existing assets
Asset Management: Versioned storage and retrieval
Reference Handling: Style and composition guidance

The Deep Think Loop: Autonomous Creative Iteration

The crown jewel of our system is the Deep Think Loop — a LoopAgent that orchestrates autonomous creative iteration:

deep_think_loop = LoopAgent(
    name="DeepThinkLoop",
    sub_agents=[
        DeepThinkPreparationAgent(name="DeepThinkPreparationAgent"),
        prompt_capture_agent,
        content_generation_agent,
        content_review_agent,
        loop_control_agent,
        LoopTerminationAgent(name="LoopTerminationAgent"),
    ],
    max_iterations=5,
)

deep_think_loop = LoopAgent(
    name="DeepThinkLoop",
    sub_agents=[
        DeepThinkPreparationAgent(name="DeepThinkPreparationAgent"),
        prompt_capture_agent,
        content_generation_agent,
        content_review_agent,
        loop_control_agent,
        LoopTerminationAgent(name="LoopTerminationAgent"),
    ],
    max_iterations=5,
)

This loop creates a feedback system that mirrors professional creative workflows:

Preparation: Context gathering and requirement analysis
Generation: Creating or refining content
Review: Quality assessment and feedback generation
Control: Decision making about continuation
Termination: Cleanup and final presentation

Why This Architecture Matters

This multi-agent approach demonstrates several key advantages over monolithic AI systems:

Specialized Expertise: Each agent excels at its specific domain, leading to better overall results.

Maintainable Complexity: Complex workflows are broken down into manageable, testable components.

Flexible Orchestration: Different workflows can reuse and recombine agents as needed.

Quality Assurance: Built-in review processes ensure professional-grade outputs.

Scalable Development: Teams can work on different agents independently.

Building the Future with ADK on Google Cloud

As AI agents become more sophisticated, having a robust development framework becomes crucial. Google's ADK provides the foundation for building these complex, stateful, multi-agent systems:

Model Agnostic: Work with Gemini, Claude, or any other model through standardized interfaces.

Cloud Native: Built for Google Cloud's infrastructure with seamless scaling and deployment.

Rich Tool Ecosystem: Extensive library of pre-built tools and integrations.

Enterprise Ready: Security, monitoring, and governance features built-in.

The Road Ahead

The design agent represents just the beginning of what's possible with sophisticated AI orchestration. By combining ADK's multi-agent capabilities with advanced models like Gemini, we're seeing the emergence of AI systems that don't just assist with tasks — they think, iterate, and produce work at professional quality levels.

The shift from simple AI tools to complex, stateful AI agents represents a fundamental change in how we approach automation. Just as the web moved from static pages to dynamic applications, AI is evolving from one-shot responses to persistent, context-aware assistants.

For developers and organizations looking to build the next generation of AI-powered applications, understanding and embracing these multi-agent patterns isn't just an advantage — it's becoming essential. The future belongs to AI systems that can think deeply, collaborate effectively, and produce results that meet the highest standards of human creativity.

The question isn't whether AI agents will transform creative workflows — it's whether you'll be ready to build and deploy them when the transformation arrives.

Want to explore the code behind this design agent? The complete implementation is available in this GitHub repository, showcasing practical patterns for building sophisticated AI agents with Google ADK.

Contents