Building AI-Powered Java Applications with Spring AI: The Complete Guide
A complete guide to building AI-powered Java applications with Spring AI — covering ChatClient, prompt templates, structured output, RAG, function calling, advisors, chat memory, embeddings, multi-modality, and provider configuration.
Why Spring AI Changes Everything for Java Developers
Spring AI brings the same productivity and portability that Spring Boot developers love to the world of artificial intelligence. Write your AI code once against a clean abstraction, then swap providers — OpenAI, Anthropic Claude, Google Gemini, Ollama, AWS Bedrock — with a configuration change, not a code rewrite.
Released as 1.0 GA in May 2025, Spring AI provides: a unified chat API, structured output mapping to Java records, built-in RAG support, function/tool calling, chat memory, advisors, embeddings, image generation, multi-modality, and evaluation — all with Spring Boot auto-configuration.
Getting Started
Add the BOM and a provider starter to your pom.xml:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.1.4</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<!-- Pick ONE provider starter -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
Configure in application.yml:
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
temperature: 0.7
Core Abstractions: ChatModel, ChatClient, Prompt
Spring AI's power lies in provider-agnostic interfaces. Your code programs against abstractions; Spring Boot wires in the concrete provider.
ChatClient — The Fluent API (Recommended)
@RestController
class ChatController {
private final ChatClient chatClient;
ChatController(ChatClient.Builder builder) {
this.chatClient = builder
.defaultSystem("You are a helpful coding assistant.")
.build();
}
@GetMapping("/chat")
String chat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.call()
.content();
}
}
Message Types
SystemMessage sets instructions. UserMessage carries user input (text + media for multimodal). AssistantMessage holds model replies. ToolResponseMessage returns tool/function results.
Streaming Responses
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
Flux<String> streamChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}
Prompt Templates
Keep prompts reusable with variable substitution using Spring AI's PromptTemplate:
// Inline template
String answer = chatClient.prompt()
.user(u -> u
.text("List {count} best practices for {topic}")
.param("count", "5")
.param("topic", "REST API design"))
.call()
.content();
// System prompt template
String systemText = """
You are an expert in {domain}.
Reply in the style of a {style}.
""";
SystemPromptTemplate systemTemplate = new SystemPromptTemplate(systemText);
Message systemMessage = systemTemplate.createMessage(Map.of(
"domain", "distributed systems",
"style", "senior engineer"
));
// Load from classpath resource
@Value("classpath:/prompts/analysis.st")
private Resource analysisTemplate;
PromptTemplate template = new PromptTemplate(analysisTemplate);
Prompt Engineering Best Practices
Be specific: Tell the model exactly what format, tone, and constraints you need. Use system messages: Set the persona and rules in the system prompt, user content in the user prompt. One-shot/few-shot: Include an example of the desired output. Chain-of-thought: Ask the model to "think step by step" for complex reasoning. Structured output: Request JSON and map it to records (see next section).
Structured Output — AI Responses as Java Records
Map AI-generated text directly into typed Java objects. No manual JSON parsing needed.
// Define your record
record BookRecommendation(String title, String author,
String genre, String summary) {}
// Single entity
BookRecommendation book = chatClient.prompt()
.user("Recommend a classic science fiction novel.")
.call()
.entity(BookRecommendation.class);
// List of entities
List<BookRecommendation> books = chatClient.prompt()
.user("Recommend 5 classic sci-fi novels.")
.call()
.entity(new ParameterizedTypeReference<List<BookRecommendation>>() {});
// Map output
Map<String, Object> data = chatClient.prompt()
.user("List the population of Tokyo, London, and New York")
.call()
.entity(new ParameterizedTypeReference<Map<String, Object>>() {});
RAG: Retrieval Augmented Generation
RAG lets your AI answer questions using your own data by retrieving relevant documents from a vector store and injecting them as context in the prompt.
Step 1: Document Ingestion (ETL)
@Component
class DocumentIngestionService {
private final VectorStore vectorStore;
DocumentIngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
public void ingest(String pdfPath) {
// 1. Read documents (PDF, JSON, HTML, Markdown, DOCX supported)
PagePdfDocumentReader reader = new PagePdfDocumentReader(pdfPath,
PdfDocumentReaderConfig.builder()
.withPagesPerDocument(1).build());
// 2. Split into chunks
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(800)
.withMinChunkSizeChars(350).build();
// 3. Store — embeddings generated automatically
vectorStore.write(splitter.apply(reader.read()));
}
}
Step 2: Query with QuestionAnswerAdvisor
ChatResponse response = chatClient.prompt()
.advisors(QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder()
.similarityThreshold(0.75)
.topK(5).build())
.build())
.user("What does our refund policy say about digital products?")
.call()
.chatResponse();
Advanced RAG with Query Rewriting
Advisor ragAdvisor = RetrievalAugmentationAdvisor.builder()
.queryTransformers(RewriteQueryTransformer.builder()
.chatClientBuilder(chatClientBuilder.build().mutate()).build())
.documentRetriever(VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.similarityThreshold(0.50)
.topK(5).build())
.build();
String answer = chatClient.prompt()
.advisors(ragAdvisor)
.user("How do I configure SSL?")
.call()
.content();
Vector Store Configuration (PGVector)
# application.yml
spring:
ai:
vectorstore:
pgvector:
initialize-schema: true
dimensions: 1536
distance-type: cosine_distance
Supported vector stores: PGVector, Chroma, Pinecone, Redis, Milvus, Weaviate, Qdrant, Elasticsearch, MongoDB Atlas, Neo4j, and more.
Function Calling / Tool Use
Let AI models invoke your Java methods to fetch real-time data or perform actions.
Declarative with @Tool
class WeatherTools {
@Tool(description = "Get current weather for a given city")
String getWeather(
@ToolParam(description = "City name") String city,
@ToolParam(description = "Temperature unit", required = false) String unit) {
// Call a real weather API here
return "Weather in %s: 22 degrees %s, sunny."
.formatted(city, unit != null ? unit : "Celsius");
}
}
// Use it — the model decides when to call the function
String response = chatClient.prompt()
.user("What's the weather like in London?")
.tools(new WeatherTools())
.call()
.content();
Functions as Spring Beans
public record CurrencyRequest(String from, String to, double amount) {}
public record CurrencyResponse(double convertedAmount, double rate) {}
@Bean
@Description("Convert an amount from one currency to another")
Function<CurrencyRequest, CurrencyResponse> convertCurrency() {
return request -> {
double rate = fetchExchangeRate(request.from(), request.to());
return new CurrencyResponse(request.amount() * rate, rate);
};
}
// Reference by bean name
String answer = chatClient.prompt()
.user("Convert 100 USD to EUR")
.toolNames("convertCurrency")
.call()
.content();
Tool Context — Pass Extra Data
class CustomerTools {
@Tool(description = "Get customer by ID")
Customer getCustomer(Long id, ToolContext ctx) {
String tenantId = (String) ctx.getContext().get("tenantId");
return customerRepo.findByIdAndTenant(id, tenantId);
}
}
String answer = chatClient.prompt()
.user("Tell me about customer #42")
.tools(new CustomerTools())
.toolContext(Map.of("tenantId", "acme-corp"))
.call()
.content();
Advisors — Interceptors for AI Calls
Advisors modify prompts before they reach the model and process responses on the way back — like Spring MVC interceptors for AI.
Chat Memory — Conversation History
ChatMemory memory = MessageWindowChatMemory.builder()
.chatMemoryRepository(new InMemoryChatMemoryRepository())
.maxMessages(20).build();
ChatClient client = ChatClient.builder(chatModel)
.defaultAdvisors(MessageChatMemoryAdvisor.builder(memory).build())
.build();
// First call
client.prompt().user("My name is Alice")
.advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "session-1"))
.call().content();
// Second call — remembers the name
client.prompt().user("What is my name?")
.advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "session-1"))
.call().content(); // "Your name is Alice"
Custom Advisor — Latency Tracking
public class LatencyAdvisor implements CallAdvisor {
public String getName() { return "LatencyAdvisor"; }
public int getOrder() { return 0; }
public ChatClientResponse adviseCall(ChatClientRequest request,
CallAdvisorChain chain) {
long start = System.currentTimeMillis();
ChatClientResponse response = chain.nextCall(request);
log.info("AI call took {}ms", System.currentTimeMillis() - start);
return response;
}
}
Combining Multiple Advisors
ChatClient client = ChatClient.builder(chatModel)
.defaultAdvisors(
MessageChatMemoryAdvisor.builder(memory).build(),
QuestionAnswerAdvisor.builder(vectorStore).build(),
new SimpleLoggerAdvisor(),
new LatencyAdvisor()
).build();
Embeddings
Convert text into numerical vectors for similarity search, clustering, and RAG.
@Service
class EmbeddingService {
private final EmbeddingModel embeddingModel;
// Single text
float[] embed(String text) {
return embeddingModel.embed(text);
}
// Batch
List<float[]> embedBatch(List<String> texts) {
return embeddingModel.embed(texts);
}
// Vector dimensions
int dimensions() { return embeddingModel.dimensions(); }
}
Multi-Modality — Images + Text
Send images alongside text to models like GPT-4o, Claude 3, or Gemini:
String description = chatClient.prompt()
.user(u -> u
.text("Describe what you see in this image.")
.media(MimeTypeUtils.IMAGE_PNG,
new ClassPathResource("/images/diagram.png")))
.call()
.content();
// From URL
String analysis = chatClient.prompt()
.user(u -> u
.text("What's in this image?")
.media(MimeTypeUtils.IMAGE_JPEG,
URI.create("https://example.com/photo.jpg")))
.call()
.content();
Image Generation
@RestController
class ImageController {
private final ImageModel imageModel;
@GetMapping("/generate-image")
String generateImage(@RequestParam String description) {
ImageResponse response = imageModel.call(
new ImagePrompt(description,
OpenAiImageOptions.builder()
.quality("hd").N(1)
.height(1024).width(1024).build()));
return response.getResult().getOutput().getUrl();
}
}
Evaluation — Test Your AI
Spring AI provides evaluators to check relevance and catch hallucinations:
// Is the response relevant to the question and context?
RelevancyEvaluator evaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));
EvaluationResponse eval = evaluator.evaluate(
new EvaluationRequest(question, context, aiResponse));
assertThat(eval.isPass()).isTrue();
// Fact-checking — detect hallucinations
FactCheckingEvaluator factChecker = new FactCheckingEvaluator(
ChatClient.builder(chatModel));
EvaluationResponse result = factChecker.evaluate(
new EvaluationRequest(knownFacts, Collections.emptyList(), claim));
assertFalse(result.isPass()); // claim contradicts known facts
Provider Configuration Cheat Sheet
| Provider | Starter Artifact | Key Config |
|---|---|---|
| OpenAI | spring-ai-starter-model-openai | spring.ai.openai.api-key |
| Anthropic Claude | spring-ai-starter-model-anthropic | spring.ai.anthropic.api-key |
| Ollama (Local) | spring-ai-starter-model-ollama | spring.ai.ollama.base-url |
| AWS Bedrock | spring-ai-starter-model-bedrock | spring.ai.bedrock.aws.region |
| Azure OpenAI | spring-ai-starter-model-azure-openai | spring.ai.azure.openai.api-key |
| Google Gemini | spring-ai-starter-model-vertex-ai | spring.ai.vertex.ai.gemini.project-id |
| PGVector | spring-ai-starter-vector-store-pgvector | spring.ai.vectorstore.pgvector.* |
Key Takeaways
Provider-agnostic: Write once, swap AI providers via config. No vendor lock-in.
Spring-native: Auto-configuration, dependency injection, profiles — everything Spring developers expect.
Production-ready patterns: RAG, tool calling, chat memory, advisors, evaluation, and structured output are all built in.
Start simple: ChatClient.prompt().user("...").call().content() — that's your first AI call. Add RAG, tools, and memory as you need them.
Spring AI makes AI integration feel like any other Spring dependency — import the starter, configure, inject, and use.