Skip to content

Step 06 - Multimodal Agents

New Requirement: Visual Car Inspection

In Step 5, you implemented the Human-in-the-Loop pattern for safe, controlled disposition decisions. The system relies entirely on textual feedback from employees returning cars. But what if the person returning the car could also upload a photo?

The Miles of Smiles management team wants to enhance the rental return process:

Allow employees to optionally upload an image of the car when returning it, so the system can automatically enrich the rental feedback with visual observations.

This is a common real-world scenario where:

  1. Text alone is insufficient: An employee might write “car looks fine” but a photo reveals scratches or dents they missed
  2. Multimodal AI is powerful: Modern LLMs can analyze images alongside text to provide richer assessments

You’ll learn how to integrate multimodal capabilities (text + image) into your existing agentic workflow using LangChain4j’s ImageContent.


What You’ll Learn

In this step, you will:

  • Add image upload to the rental return form using multipart form data
  • Convert uploaded images to LangChain4j’s ImageContent for multimodal processing
  • Create a CarImageAnalysisAgent that analyzes car images and enriches rental feedback
  • Integrate the new agent at the beginning of the existing CarProcessingWorkflow sequence
  • Understand how ImageContent flows through agent parameters using @UserMessage
  • Understand how optional agents can be used to handle the absence of an input and skip the work of the agent
  • See how the agent gracefully handles the absence of an image, returning the feedback unchanged

Understanding Multimodal Agents

What is Multimodal Processing?

Multimodal processing allows an AI agent to work with multiple types of content simultaneously — in this case, text and images. Instead of just reading feedback like “the car has some damage”, the agent can also see the car and identify specific issues.

How LangChain4j Handles Images

LangChain4j provides the ImageContent class to represent image data in messages sent to the LLM:

  • ImageContent: Wraps an image (as base64-encoded data with a MIME type) as a content part
  • When passed as a method parameter annotated with @UserMessage, it is automatically included alongside text in the message sent to the LLM
  • The LLM receives both the text prompt and the image, enabling visual reasoning

The Enrichment Pattern

Rather than creating a separate “image analysis” output, the CarImageAnalysisAgent uses an enrichment pattern:

  1. Receives the original rental feedback text and an optional car image
  2. If an image is present, analyzes it and appends visual observations to the feedback
  3. If no image is present, returns the feedback unchanged
  4. The enriched feedback then flows into the existing FeedbackAnalysisWorkflow — no downstream changes needed

This is elegant because it preserves the existing workflow structure while adding new capabilities.

Why ImageContent Stays Separate:

ImageContent is passed as a separate parameter alongside the String feedback:

  • ImageContent is a special LangChain4j type for multimodal AI, not simple text data
  • It’s only used by the image analysis agent, not by other agents in the workflow
  • Keeping it separate maintains the clean separation between feedback text and multimodal content

What Are We Going to Build?

We’re enhancing the car management system with multimodal image analysis:

  1. Update the UI: Add an image upload field for rented cars in the Fleet Status grid
  2. Update the REST endpoint: Accept multipart form data with an optional image
  3. Convert to ImageContent: Transform the uploaded file into a LangChain4j ImageContent
  4. Create CarImageAnalysisAgent: A new agent that analyzes car images
  5. Update the workflow: Insert the new agent at the beginning of the sequence

The Updated Architecture:

graph TB
    Start([Car Return with optional image]) --> A[CarProcessingWorkflow<br/>Sequential]

    A --> IMG[Step 1: CarImageAnalysisAgent<br/>Image Analysis]
    IMG -->|enriched rentalFeedback| B[Step 2: FeedbackAnalysisWorkflow<br/>Parallel Mapper]
    B --> B1[FeedbackTask.cleaning()]
    B --> B2[FeedbackTask.maintenance()]
    B --> B3[FeedbackTask.disposition()]
    B1 --> BA[FeedbackAnalysisAgent]
    B2 --> BA
    B3 --> BA
    BA --> BEnd[FeedbackAnalysisResults]

    BEnd --> C[Step 3: FleetSupervisorAgent<br/>Autonomous Orchestration]
    C --> CEnd[Supervisor Decision]

    CEnd --> D[Step 4: CarConditionFeedbackAgent<br/>Final Summary]
    D --> End([Updated Car])

    style A fill:#90EE90
    style IMG fill:#E8B4F8
    style B fill:#87CEEB
    style C fill:#FFB6C1
    style D fill:#90EE90
    style Start fill:#E8E8E8
    style End fill:#E8E8E8
Hold "Alt" / "Option" to enable pan & zoom

The Key Innovation:

The CarImageAnalysisAgent sits at the beginning of the sequence, before the FeedbackAnalysisWorkflow. Its output key is rentalFeedback, which means it replaces the original rental feedback in the agentic scope with the enriched version. All downstream agents automatically receive the enriched feedback without any code changes.


Prerequisites

Before starting:

  • Completed Step 05 — This step builds on Step 5’s architecture
  • Application from Step 05 is stopped (Ctrl+C)
  • Understanding of the existing CarProcessingWorkflow sequence

Part 1: Update the UI for Image Upload

Update the JavaScript

The action cell for all actionable cars in populateFleetStatusTable now includes a file input for optional image upload:

app.js (action cell in populateFleetStatusTable)
if (car.status === 'RENTED' || car.status === 'AT_CLEANING' || car.status === 'IN_MAINTENANCE') {
    actionCell = `
        <td>
            <form onsubmit="processFeedback(event, ${car.id}, '${car.status}')">
                <input type="file" id="car-image-${car.id}" accept="image/*">
                <input type="text" class="feedback-input" id="feedback-${car.id}" placeholder="Enter feedback">
                <button type="submit" class="return-button">Return</button>
            </form>
        </td>`;
}

The processFeedback function is updated to send a FormData object (multipart) instead of a simple query parameter, and now uses a single consolidated endpoint for all car returns:

app.js (processFeedback with FormData)
const imageInput = document.getElementById(`car-image-${carId}`);
const formData = new FormData();
formData.append('feedback', feedback);
if (imageInput && imageInput.files.length > 0) {
    formData.append('carImage', imageInput.files[0]);
}

fetch(`/car-management/return/${carId}`, {
    method: 'POST',
    body: formData
})

Key Points:

  • Uses FormData for multipart encoding — all statuses use the same endpoint and format
  • The image is only appended if the user selected a file
  • No Content-Type header is set — the browser automatically adds multipart/form-data with the correct boundary

Part 2: Update the REST Endpoint

Accept Multipart Form Data

Update src/main/java/com/carmanagement/resource/CarManagementResource.java to accept the image as a FileUpload and convert it to ImageContent:

CarManagementResource.java hl_lines=
package com.carmanagement.resource;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;

import java.io.IOException;
import java.nio.file.Files;
import java.util.Base64;

import org.jboss.resteasy.reactive.RestForm;
import org.jboss.resteasy.reactive.multipart.FileUpload;

import dev.langchain4j.data.message.ImageContent;
import io.quarkus.logging.Log;
import io.smallrye.common.annotation.Blocking;
import io.smallrye.mutiny.Uni;

import com.carmanagement.service.CarManagementService;

/**
 * REST resource for car management operations.
 * Uses blocking processing for AI agent workflows.
 */
@Path("/car-management")
public class CarManagementResource {

    @Inject
    CarManagementService carManagementService;

    /**
     * Process a car return from any status (rental, cleaning, or maintenance).
     * This is a blocking operation due to AI agent processing.
     *
     * @param carNumber The car number
     * @param feedback Optional feedback about the return
     * @param carImage Optional image of the car being returned (multipart form data)
     * @return Uni that completes with the result
     */
    @POST
    @Path("/return/{carNumber}")
    @Consumes(MediaType.MULTIPART_FORM_DATA)
    @Blocking
    public Uni<Response> processReturn(Integer carNumber, @RestForm String feedback, @RestForm FileUpload carImage) {
        ImageContent imageContent = toImageContent(carImage);

        return carManagementService.processCarReturn(carNumber, feedback != null ? feedback : "", imageContent)
            .onItem().transform(result -> Response.ok(result).build())
            .onFailure().recoverWithItem(e -> {
                Log.error(e.getMessage(), e);
                return Response.status(Response.Status.INTERNAL_SERVER_ERROR)
                        .entity("Error processing car return: " + e.getMessage())
                        .build();
            });
    }

    @GET
    @Path("/report")
    @Produces(MediaType.TEXT_HTML)
    public Response report() {
        return Response.ok(carManagementService.report()).build();
    }

    private ImageContent toImageContent(FileUpload fileUpload) {
        if (fileUpload == null || fileUpload.filePath() == null) {
            return null;
        }
        try {
            byte[] bytes = Files.readAllBytes(fileUpload.filePath());
            String base64 = Base64.getEncoder().encodeToString(bytes);
            String mimeType = fileUpload.contentType();
            return new ImageContent(base64, mimeType);
        } catch (IOException e) {
            Log.error("Failed to read uploaded car image", e);
            return null;
        }
    }
}

Let’s break it down:

@Consumes(MediaType.MULTIPART_FORM_DATA)

The consolidated return endpoint now consumes multipart form data instead of query parameters, and routes feedback based on the car’s current status:

@POST
@Path("/return/{carNumber}")
@Consumes(MediaType.MULTIPART_FORM_DATA)
@Blocking
public Uni<Response> processReturn(Integer carNumber,
        @RestForm String feedback, @RestForm FileUpload carImage) {
  • @RestForm: Extracts form fields from the multipart request
  • FileUpload: RESTEasy Reactive’s type for handling uploaded files
  • The endpoint looks up the car’s status and routes the feedback to the appropriate parameter

The toImageContent Helper

private ImageContent toImageContent(FileUpload fileUpload) {
    if (fileUpload == null || fileUpload.filePath() == null) {
        return null;
    }
    try {
        byte[] bytes = Files.readAllBytes(fileUpload.filePath());
        String base64 = Base64.getEncoder().encodeToString(bytes);
        String mimeType = fileUpload.contentType();
        return new ImageContent(base64, mimeType);
    } catch (IOException e) {
        Log.error("Failed to read uploaded car image", e);
        return null;
    }
}
  • Reads the uploaded file and converts it to base64-encoded data
  • Creates an ImageContent with the base64 data and the file’s MIME type (e.g., image/jpeg, image/png)
  • Falls back to null when no image is provided

Part 3: Pass the Image Through the Service Layer

Update src/main/java/com/carmanagement/service/CarManagementService

Add ImageContent as a parameter and forward it to the workflow:

CarManagementService.java hl_lines=
package com.carmanagement.service;

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;

import com.carmanagement.agentic.workflow.CarProcessingWorkflow;
import com.carmanagement.model.CarConditions;
import com.carmanagement.model.CarInfo;
import com.carmanagement.model.CarStatus;
import com.carmanagement.model.FeedbackTask;
import dev.langchain4j.data.message.ImageContent;
import io.quarkus.logging.Log;
import io.smallrye.mutiny.Uni;

import java.util.List;

import static dev.langchain4j.agentic.observability.HtmlReportGenerator.generateReport;

/**
 * Service for managing car returns from various operations.
 * Uses async processing to handle Human-in-the-Loop workflow pauses.
 */
@ApplicationScoped
public class CarManagementService {

    @Inject
    CarProcessingWorkflow carProcessingWorkflow;

    /**
     * Process a car return from any operation.
     * This method runs asynchronously to handle workflow pauses for human approval.
     * 
     * @param carNumber The car number
     * @param feedback Optional feedback
     * @param carImage Optional image of the car
     * @return Uni that completes with the result of the processing
     */
    public Uni<String> processCarReturn(Integer carNumber, String feedback, ImageContent carImage) {

        return Uni.createFrom().item(() -> {
            CarInfo carInfo = findCarInfo(carNumber);
            if (carInfo == null) {
                return "Car not found with number: " + carNumber;
            }

            // Create the list of feedback tasks for parallel analysis
            List<FeedbackTask> tasks = List.of(
                    FeedbackTask.cleaning(),
                    FeedbackTask.maintenance(),
                    FeedbackTask.disposition()
            );

            // Process the car return using the workflow with supervisor
            // This may PAUSE if human approval is needed
            CarConditions carConditions = carProcessingWorkflow.processCarReturn(
                    tasks,
                    carInfo,
                    carNumber,
                    feedback,
                    carImage);

            Log.info("CarConditionFeedbackAgent updating...");

            // Update the car's condition with the result from CarConditionFeedbackAgent
            carInfo.condition = carConditions.generalCondition();

            // Update the car status based on the required action
            switch (carConditions.carAssignment()) {
                case DISPOSITION:
                    carInfo.status = CarStatus.PENDING_DISPOSITION;
                    Log.info("Car marked for disposition - awaiting final decision");
                    break;
                case MAINTENANCE:
                    carInfo.status = CarStatus.IN_MAINTENANCE;
                    break;
                case CLEANING:
                    carInfo.status = CarStatus.AT_CLEANING;
                    break;
                case NONE:
                    carInfo.status = CarStatus.AVAILABLE;
                    break;
            }

            // Persist the changes to the database in a separate transaction
            updateCarInfo(carInfo);

            return carConditions.generalCondition();
        }).runSubscriptionOn(io.smallrye.mutiny.infrastructure.Infrastructure.getDefaultWorkerPool());
    }

    /**
     * Find car info in a read-only transaction
     */
    @Transactional(Transactional.TxType.REQUIRES_NEW)
    CarInfo findCarInfo(Integer carNumber) {
        return CarInfo.findById(carNumber);
    }

    /**
     * Update car info in a separate transaction after workflow completes.
     * Uses merge to handle detached entity from the workflow.
     */
    @Transactional(Transactional.TxType.REQUIRES_NEW)
    void updateCarInfo(CarInfo carInfo) {
        // Merge the detached entity back into the persistence context
        CarInfo.getEntityManager().merge(carInfo);
    }

    public String report() {
        return generateReport(carProcessingWorkflow.agentMonitor());
    }
}

The image is passed straight through to the workflow alongside the feedback string and the carImage parameter:

CarConditions carConditions = carProcessingWorkflow.processCarReturn(
        tasks,
        carInfo,
        carNumber,
        feedback,
        carImage);

Part 4: Create the CarImageAnalysisAgent

This is the core of this step — a new agent that processes car images.

Create src/main/java/com/carmanagement/agentic/agents/CarImageAnalysisAgent.java:

CarImageAnalysisAgent.java hl_lines=
package com.carmanagement.agentic.agents;

import dev.langchain4j.agentic.Agent;
import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;

/**
 * Agent that analyzes a car image and enriches the rental feedback with visual observations.
 * If no image is provided, the rental feedback is returned unchanged.
 */
public interface CarImageAnalysisAgent {

    @SystemMessage("""
        You are a car image analyst for a car rental company.
        You will receive the current rental feedback for a car being returned.
        If an image of the car is provided, analyze it and rewrite the rental feedback taking count of
        your visual observations about the car's condition (e.g., visible damage, scratches, dents,
        cleanliness issues, tire condition, etc.).
        Avoid appending your visual observations in a separated section of the response, but combine
        the existing rental feedback, if present, with what you can see from the image in a single response.
        If no image is provided, or the image is empty or it doesn't seem related to a car,
        simply return the rental feedback exactly as it is, without any modification.
        Your response must always include the original rental feedback text followed by your observations if any.
        In any cases the returned response MUST be a single sentence.
        """)
    @UserMessage("""
        Feedback: {feedback}
        """)
    @Agent(description = "Car image analyzer. Enriches rental feedback with visual observations from a car image.",
            outputKey = "feedback", optional = true)
    String analyzeCarImage(String feedback, @UserMessage @V("carImage") ImageContent carImage);
}

Let’s break it down:

The @SystemMessage

@SystemMessage("""
    You are a car image analyst for a car rental company.
    You will receive the current rental feedback for a car being returned.
    If an image of the car is provided, analyze it and enrich the rental feedback by appending
    your visual observations about the car's condition (e.g., visible damage, scratches, dents,
    cleanliness issues, tire condition, etc.).
    If no image is provided, return the rental feedback exactly as it is, without any modification.
    Your response must always include the original rental feedback text followed by your observations if any.
    """)

The system message instructs the LLM to:

  • Analyze the image if one is provided, looking for visible damage, cleanliness issues, etc.
  • Preserve the original feedback — always include it in the response
  • Be a no-op when there’s no image — return the feedback unchanged

The @UserMessage and ImageContent Parameter

@UserMessage("""
    Rental Feedback: {rentalFeedback}
    """)
String analyzeCarImage(String rentalFeedback, @UserMessage @V("carImage") ImageContent carImage);

Note that the @UserMessage annotation on the ImageContent parameter tells LangChain4j to include the image as an additional content part in the user message sent to the LLM. That is a particular usage of the @UserMessage annotation that is specific for multimodal content. The LLM receives both the text template and the image simultaneously, enabling multimodal reasoning. In this case we also need to add the @V annotation to specify the variable name in the template of the UserMessage.

The outputKey and the optional flag

@Agent(description = "Car image analyzer. Enriches rental feedback with visual observations from a car image.",
        outputKey = "rentalFeedback", optional = true)

The agent’s output key is rentalFeedback, which means its result replaces the rentalFeedback value in the agentic scope. All subsequent agents in the workflow (FeedbackWorkflow, FleetSupervisorAgent, etc.) will automatically receive the enriched feedback. The optional flag is set to true to allow to entirely skip the invocation of an agent if not all of its required parameters are provided; in this case it will be skipped if the image is missing.


Part 5: Update the Workflow

Add the Agent to the Sequence

Update CarProcessingWorkflow.java to include CarImageAnalysisAgent as the first sub-agent and add the ImageContent parameter:

CarProcessingWorkflow.java
package com.carmanagement.agentic.workflow;

import com.carmanagement.agentic.agents.CarConditionFeedbackAgent;
import com.carmanagement.agentic.agents.CarImageAnalysisAgent;
import com.carmanagement.agentic.agents.FleetSupervisorAgent;
import com.carmanagement.model.CarConditions;
import com.carmanagement.model.CarInfo;
import com.carmanagement.model.FeedbackTask;
import dev.langchain4j.agentic.declarative.Output;
import dev.langchain4j.agentic.declarative.SequenceAgent;
import dev.langchain4j.agentic.observability.MonitoredAgent;
import dev.langchain4j.data.message.ImageContent;
import io.quarkus.logging.Log;

import java.util.List;

/**
 * Workflow for processing car returns using a supervisor agent for complete orchestration.
 * The supervisor coordinates both feedback analysis and action agents.
 */
public interface CarProcessingWorkflow extends MonitoredAgent {

    /**
     * Processes a car return by first analyzing feedback, then using supervisor to coordinate actions.
     * CarImageAnalysisAgent analyzes the car image first.
     * FeedbackAnalysisWorkflow analyzes feedback in parallel and returns FeedbackAnalysisResults via its @Output method.
     * FleetSupervisorAgent uses these results to coordinate action agents.
     * CarConditionFeedbackAgent determines the final car assignment and condition.
     */
    @SequenceAgent(outputKey = "carProcessingAgentResult",
            subAgents = { CarImageAnalysisAgent.class, FeedbackAnalysisWorkflow.class, FleetSupervisorAgent.class, CarConditionFeedbackAgent.class })
    CarConditions processCarReturn(
            List<FeedbackTask> tasks,
            CarInfo carInfo,
            Integer carNumber,
            String feedback,
            ImageContent carImage);

    @Output
    static CarConditions output(CarConditions carConditions) {
        // CarConditionFeedbackAgent now handles all the logic for determining
        // the final car assignment, disposition status, and condition description.
        // We simply pass through its result.

        Log.debug("DEBUG CarConditions output method:");
        Log.debug("  generalCondition: " + carConditions.generalCondition());
        Log.debug("  carAssignment: " + carConditions.carAssignment());
        Log.debug("  dispositionStatus: " + carConditions.dispositionStatus());
        Log.debug("  dispositionReason: " + carConditions.dispositionReason());

        return carConditions;
    }
}

Key Changes:

  • CarImageAnalysisAgent.class is added as the first sub-agent in the @SequenceAgent
  • The sequence is now: CarImageAnalysisAgentFeedbackAnalysisWorkflowFleetSupervisorAgentCarConditionFeedbackAgent
  • ImageContent carImage is added as a new parameter to processCarReturn

The flow is:

  1. CarImageAnalysisAgent analyzes the image and enriches rentalFeedback in the scope
  2. FeedbackAnalysisWorkflow receives the enriched rentalFeedback and runs parallel analysis
  3. The rest of the workflow proceeds as before

Try It Out

Start the Application

  1. Navigate to the step-06 directory:
cd section-2/step-06
  1. Start the application:
./mvnw quarkus:dev
mvnw quarkus:dev
  1. Open http://localhost:8080

Test Without an Image

Find the Honda Civic (status: Rented) in the Fleet Status grid and enter feedback without uploading an image:

The car has a small dent on the rear bumper

Click Return.

Expected Result:

  • The CarImageAnalysisAgent receives the feedback with an empty image
  • Since there’s no meaningful image, it returns the feedback unchanged
  • The rest of the workflow processes the original feedback as before

Test With an Image

  1. Find or take a photo of a car (there is a sample image named q4-tree.png in the resources folder, but any car photo will work)
  2. In the Fleet Status grid, find the car and click “Choose File” in its Action column
  3. Select the image
  4. Enter some feedback:
Customer mentioned a minor scratch
  1. Click Return

Expected Result:

  • The CarImageAnalysisAgent analyzes the image alongside the feedback
  • It enriches the feedback with visual observations, e.g.: “Customer mentioned a minor scratch. Visual analysis: The image shows a visible scratch on the front left fender, approximately 15cm long. The paint is chipped in the affected area. Additionally, the front bumper shows minor scuff marks on the lower right corner.”
  • The enriched feedback flows into FeedbackAnalysisWorkflow, which may now detect cleaning, maintenance, or disposition needs that the original text alone wouldn’t have triggered

Check the Agent Report

Click Generate Report to see the execution trace. You’ll see the CarImageAnalysisAgent as the first step in the sequence, with its input (original feedback) and output (enriched feedback).


How It All Works Together

sequenceDiagram
    participant User
    participant UI as Web UI
    participant REST as CarManagementResource
    participant Service as CarManagementService
    participant Workflow as CarProcessingWorkflow
    participant ImageAgent as CarImageAnalysisAgent
    participant FeedbackWF as FeedbackAnalysisWorkflow

    User->>UI: Enter feedback + upload image
    UI->>REST: POST multipart (feedback + image)
    REST->>REST: toImageContent(fileUpload)
    REST->>Service: processCarReturn(..., imageContent)
    Service->>Workflow: processCarReturn(..., carImage)

    rect rgb(232, 180, 248)
    Note over Workflow,ImageAgent: Image Analysis (Step 1)
    Workflow->>ImageAgent: analyzeCarImage(rentalFeedback, carImage)
    ImageAgent->>ImageAgent: LLM analyzes text + image
    ImageAgent->>Workflow: enriched rentalFeedback
    end

    rect rgb(255, 243, 205)
    Note over Workflow,FeedbackWF: Parallel Analysis (Step 2)
    Workflow->>FeedbackWF: Uses enriched rentalFeedback
    par Concurrent Execution
        FeedbackWF->>FeedbackWF: FeedbackAnalysisAgent<br/>with FeedbackTask.cleaning()
    and
        FeedbackWF->>FeedbackWF: FeedbackAnalysisAgent<br/>with FeedbackTask.maintenance()
    and
        FeedbackWF->>FeedbackWF: FeedbackAnalysisAgent<br/>with FeedbackTask.disposition()
    end
    end

    Note over Workflow: Steps 3-4: Supervisor + Condition (unchanged)
Hold "Alt" / "Option" to enable pan & zoom

Key Takeaways

  • Multimodal agents can process both text and images in a single interaction
  • ImageContent is LangChain4j’s way to represent images for LLM consumption
  • @UserMessage on ImageContent parameters automatically includes the image in the message to the LLM
  • The enrichment pattern (outputKey matching an existing scope variable) allows new agents to augment data without changing downstream code
  • Optional agent: The agent can be skipped if no image is provided
  • Multipart form data with @RestForm FileUpload makes image upload straightforward in Quarkus
  • Base64 encoding is used to convert uploaded files into ImageContent

Experiment Further

1. Try Different Image Types

Upload various car images to see how the agent describes different conditions:

  • A clean, well-maintained car
  • A car with visible damage (dents, scratches)
  • A dirty car (mud, stains)
  • An interior shot showing wear

2. Compare With and Without Images

Return the same car with identical text feedback but with and without an image. Compare how the downstream agents (cleaning, maintenance, disposition) react differently based on the enriched feedback.

3. Adjust the System Message

Modify the CarImageAnalysisAgent’s system message to focus on specific aspects:

  • Only report safety-critical damage
  • Include estimated repair costs
  • Rate the car’s cleanliness on a scale of 1-10

Troubleshooting

Image not being processed

Verify that:

  • The file input has accept="image/*" to filter non-image files
  • The JavaScript correctly appends the file to FormData
  • The toImageContent method is reading the file and encoding it as base64
  • Check the server logs for any IOException messages
Agent returns feedback unchanged even with an image

This can happen if:

  • The image is too small or blank (the LLM sees nothing to analyze)
  • The MIME type is incorrect — verify fileUpload.contentType() returns a valid image type
  • The LLM model doesn’t support vision — ensure your configured model supports multimodal input
Request too large

Large images (>10MB) may exceed request size limits. Consider:

  • Adding accept="image/*" to the file input (already done)
  • Configuring quarkus.http.body.max-body-size in application.properties if needed
  • Compressing images client-side before upload

What’s Next?

You’ve successfully added multimodal image analysis to the car management system!

The system now:

  • Accepts optional car images during rental returns
  • Analyzes images using a multimodal LLM agent
  • Enriches rental feedback with visual observations
  • Seamlessly integrates with the existing workflow — no downstream changes needed

Key Progression: - Step 4: Sophisticated local orchestration with Supervisor Pattern - Step 5: Human-in-the-Loop for safe, controlled autonomous decisions - Step 6: Multimodal image analysis for enriched feedback

In Step 07, you’ll learn about Agent-to-Agent (A2A) communication — converting the local PricingAgent into a remote service that runs in a separate system, demonstrating how to distribute agent workloads across multiple applications!

Continue to Step 07 - Using Remote Agents (A2A)