Building AI-Powered Java Applications with Spring AI: The Complete Guide

Why Spring AI Changes Everything for Java Developers

Spring AI brings the same productivity and portability that Spring Boot developers love to the world of artificial intelligence. Write your AI code once against a clean abstraction, then swap providers — OpenAI, Anthropic Claude, Google Gemini, Ollama, AWS Bedrock — with a configuration change, not a code rewrite.

Released as 1.0 GA in May 2025, Spring AI provides: a unified chat API, structured output mapping to Java records, built-in RAG support, function/tool calling, chat memory, advisors, embeddings, image generation, multi-modality, and evaluation — all with Spring Boot auto-configuration.

Getting Started

Add the BOM and a provider starter to your pom.xml:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.1.4</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<!-- Pick ONE provider starter -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

Configure in application.yml:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7

Core Abstractions: ChatModel, ChatClient, Prompt

Spring AI's power lies in provider-agnostic interfaces. Your code programs against abstractions; Spring Boot wires in the concrete provider.

ChatClient — The Fluent API (Recommended)

@RestController
class ChatController {
    private final ChatClient chatClient;

    ChatController(ChatClient.Builder builder) {
        this.chatClient = builder
            .defaultSystem("You are a helpful coding assistant.")
            .build();
    }

    @GetMapping("/chat")
    String chat(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .call()
            .content();
    }
}

Message Types

SystemMessage sets instructions. UserMessage carries user input (text + media for multimodal). AssistantMessage holds model replies. ToolResponseMessage returns tool/function results.

Streaming Responses

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
Flux<String> streamChat(@RequestParam String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();
}

Prompt Templates

Keep prompts reusable with variable substitution using Spring AI's PromptTemplate:

// Inline template
String answer = chatClient.prompt()
    .user(u -> u
        .text("List {count} best practices for {topic}")
        .param("count", "5")
        .param("topic", "REST API design"))
    .call()
    .content();

// System prompt template
String systemText = """
    You are an expert in {domain}.
    Reply in the style of a {style}.
    """;

SystemPromptTemplate systemTemplate = new SystemPromptTemplate(systemText);
Message systemMessage = systemTemplate.createMessage(Map.of(
    "domain", "distributed systems",
    "style", "senior engineer"
));

// Load from classpath resource
@Value("classpath:/prompts/analysis.st")
private Resource analysisTemplate;
PromptTemplate template = new PromptTemplate(analysisTemplate);

Prompt Engineering Best Practices

Be specific: Tell the model exactly what format, tone, and constraints you need. Use system messages: Set the persona and rules in the system prompt, user content in the user prompt. One-shot/few-shot: Include an example of the desired output. Chain-of-thought: Ask the model to "think step by step" for complex reasoning. Structured output: Request JSON and map it to records (see next section).

Structured Output — AI Responses as Java Records

Map AI-generated text directly into typed Java objects. No manual JSON parsing needed.

// Define your record
record BookRecommendation(String title, String author,
                          String genre, String summary) {}

// Single entity
BookRecommendation book = chatClient.prompt()
    .user("Recommend a classic science fiction novel.")
    .call()
    .entity(BookRecommendation.class);

// List of entities
List<BookRecommendation> books = chatClient.prompt()
    .user("Recommend 5 classic sci-fi novels.")
    .call()
    .entity(new ParameterizedTypeReference<List<BookRecommendation>>() {});

// Map output
Map<String, Object> data = chatClient.prompt()
    .user("List the population of Tokyo, London, and New York")
    .call()
    .entity(new ParameterizedTypeReference<Map<String, Object>>() {});

RAG: Retrieval Augmented Generation

RAG lets your AI answer questions using your own data by retrieving relevant documents from a vector store and injecting them as context in the prompt.

Step 1: Document Ingestion (ETL)

@Component
class DocumentIngestionService {
    private final VectorStore vectorStore;

    DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void ingest(String pdfPath) {
        // 1. Read documents (PDF, JSON, HTML, Markdown, DOCX supported)
        PagePdfDocumentReader reader = new PagePdfDocumentReader(pdfPath,
            PdfDocumentReaderConfig.builder()
                .withPagesPerDocument(1).build());

        // 2. Split into chunks
        TokenTextSplitter splitter = TokenTextSplitter.builder()
            .withChunkSize(800)
            .withMinChunkSizeChars(350).build();

        // 3. Store — embeddings generated automatically
        vectorStore.write(splitter.apply(reader.read()));
    }
}

Step 2: Query with QuestionAnswerAdvisor

ChatResponse response = chatClient.prompt()
    .advisors(QuestionAnswerAdvisor.builder(vectorStore)
        .searchRequest(SearchRequest.builder()
            .similarityThreshold(0.75)
            .topK(5).build())
        .build())
    .user("What does our refund policy say about digital products?")
    .call()
    .chatResponse();

Advanced RAG with Query Rewriting

Advisor ragAdvisor = RetrievalAugmentationAdvisor.builder()
    .queryTransformers(RewriteQueryTransformer.builder()
        .chatClientBuilder(chatClientBuilder.build().mutate()).build())
    .documentRetriever(VectorStoreDocumentRetriever.builder()
        .vectorStore(vectorStore)
        .similarityThreshold(0.50)
        .topK(5).build())
    .build();

String answer = chatClient.prompt()
    .advisors(ragAdvisor)
    .user("How do I configure SSL?")
    .call()
    .content();

Vector Store Configuration (PGVector)

# application.yml
spring:
  ai:
    vectorstore:
      pgvector:
        initialize-schema: true
        dimensions: 1536
        distance-type: cosine_distance

Supported vector stores: PGVector, Chroma, Pinecone, Redis, Milvus, Weaviate, Qdrant, Elasticsearch, MongoDB Atlas, Neo4j, and more.

Function Calling / Tool Use

Let AI models invoke your Java methods to fetch real-time data or perform actions.

Declarative with @Tool

class WeatherTools {

    @Tool(description = "Get current weather for a given city")
    String getWeather(
            @ToolParam(description = "City name") String city,
            @ToolParam(description = "Temperature unit", required = false) String unit) {
        // Call a real weather API here
        return "Weather in %s: 22 degrees %s, sunny."
            .formatted(city, unit != null ? unit : "Celsius");
    }
}

// Use it — the model decides when to call the function
String response = chatClient.prompt()
    .user("What's the weather like in London?")
    .tools(new WeatherTools())
    .call()
    .content();

Functions as Spring Beans

public record CurrencyRequest(String from, String to, double amount) {}
public record CurrencyResponse(double convertedAmount, double rate) {}

@Bean
@Description("Convert an amount from one currency to another")
Function<CurrencyRequest, CurrencyResponse> convertCurrency() {
    return request -> {
        double rate = fetchExchangeRate(request.from(), request.to());
        return new CurrencyResponse(request.amount() * rate, rate);
    };
}

// Reference by bean name
String answer = chatClient.prompt()
    .user("Convert 100 USD to EUR")
    .toolNames("convertCurrency")
    .call()
    .content();

Tool Context — Pass Extra Data

class CustomerTools {
    @Tool(description = "Get customer by ID")
    Customer getCustomer(Long id, ToolContext ctx) {
        String tenantId = (String) ctx.getContext().get("tenantId");
        return customerRepo.findByIdAndTenant(id, tenantId);
    }
}

String answer = chatClient.prompt()
    .user("Tell me about customer #42")
    .tools(new CustomerTools())
    .toolContext(Map.of("tenantId", "acme-corp"))
    .call()
    .content();

Advisors — Interceptors for AI Calls

Advisors modify prompts before they reach the model and process responses on the way back — like Spring MVC interceptors for AI.

Chat Memory — Conversation History

ChatMemory memory = MessageWindowChatMemory.builder()
    .chatMemoryRepository(new InMemoryChatMemoryRepository())
    .maxMessages(20).build();

ChatClient client = ChatClient.builder(chatModel)
    .defaultAdvisors(MessageChatMemoryAdvisor.builder(memory).build())
    .build();

// First call
client.prompt().user("My name is Alice")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "session-1"))
    .call().content();

// Second call — remembers the name
client.prompt().user("What is my name?")
    .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "session-1"))
    .call().content(); // "Your name is Alice"

Custom Advisor — Latency Tracking

public class LatencyAdvisor implements CallAdvisor {
    public String getName() { return "LatencyAdvisor"; }
    public int getOrder() { return 0; }

    public ChatClientResponse adviseCall(ChatClientRequest request,
                                          CallAdvisorChain chain) {
        long start = System.currentTimeMillis();
        ChatClientResponse response = chain.nextCall(request);
        log.info("AI call took {}ms", System.currentTimeMillis() - start);
        return response;
    }
}

Combining Multiple Advisors

ChatClient client = ChatClient.builder(chatModel)
    .defaultAdvisors(
        MessageChatMemoryAdvisor.builder(memory).build(),
        QuestionAnswerAdvisor.builder(vectorStore).build(),
        new SimpleLoggerAdvisor(),
        new LatencyAdvisor()
    ).build();

Embeddings

Convert text into numerical vectors for similarity search, clustering, and RAG.

@Service
class EmbeddingService {
    private final EmbeddingModel embeddingModel;

    // Single text
    float[] embed(String text) {
        return embeddingModel.embed(text);
    }

    // Batch
    List<float[]> embedBatch(List<String> texts) {
        return embeddingModel.embed(texts);
    }

    // Vector dimensions
    int dimensions() { return embeddingModel.dimensions(); }
}

Multi-Modality — Images + Text

Send images alongside text to models like GPT-4o, Claude 3, or Gemini:

String description = chatClient.prompt()
    .user(u -> u
        .text("Describe what you see in this image.")
        .media(MimeTypeUtils.IMAGE_PNG,
               new ClassPathResource("/images/diagram.png")))
    .call()
    .content();

// From URL
String analysis = chatClient.prompt()
    .user(u -> u
        .text("What's in this image?")
        .media(MimeTypeUtils.IMAGE_JPEG,
               URI.create("https://example.com/photo.jpg")))
    .call()
    .content();

Image Generation

@RestController
class ImageController {
    private final ImageModel imageModel;

    @GetMapping("/generate-image")
    String generateImage(@RequestParam String description) {
        ImageResponse response = imageModel.call(
            new ImagePrompt(description,
                OpenAiImageOptions.builder()
                    .quality("hd").N(1)
                    .height(1024).width(1024).build()));
        return response.getResult().getOutput().getUrl();
    }
}

Evaluation — Test Your AI

Spring AI provides evaluators to check relevance and catch hallucinations:

// Is the response relevant to the question and context?
RelevancyEvaluator evaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));
EvaluationResponse eval = evaluator.evaluate(
    new EvaluationRequest(question, context, aiResponse));
assertThat(eval.isPass()).isTrue();

// Fact-checking — detect hallucinations
FactCheckingEvaluator factChecker = new FactCheckingEvaluator(
    ChatClient.builder(chatModel));
EvaluationResponse result = factChecker.evaluate(
    new EvaluationRequest(knownFacts, Collections.emptyList(), claim));
assertFalse(result.isPass()); // claim contradicts known facts

Provider Configuration Cheat Sheet

Provider	Starter Artifact	Key Config
OpenAI	`spring-ai-starter-model-openai`	`spring.ai.openai.api-key`
Anthropic Claude	`spring-ai-starter-model-anthropic`	`spring.ai.anthropic.api-key`
Ollama (Local)	`spring-ai-starter-model-ollama`	`spring.ai.ollama.base-url`
AWS Bedrock	`spring-ai-starter-model-bedrock`	`spring.ai.bedrock.aws.region`
Azure OpenAI	`spring-ai-starter-model-azure-openai`	`spring.ai.azure.openai.api-key`
Google Gemini	`spring-ai-starter-model-vertex-ai`	`spring.ai.vertex.ai.gemini.project-id`
PGVector	`spring-ai-starter-vector-store-pgvector`	`spring.ai.vectorstore.pgvector.*`

Key Takeaways

Provider-agnostic: Write once, swap AI providers via config. No vendor lock-in.

Spring-native: Auto-configuration, dependency injection, profiles — everything Spring developers expect.

Production-ready patterns: RAG, tool calling, chat memory, advisors, evaluation, and structured output are all built in.

Start simple: ChatClient.prompt().user("...").call().content() — that's your first AI call. Add RAG, tools, and memory as you need them.

Spring AI makes AI integration feel like any other Spring dependency — import the starter, configure, inject, and use.