Spring Data JPA Demystified: From Tables to Java Objects
A comprehensive guide to Spring Data JPA covering ORM concepts, entity mappings, relationships, the N+1 problem with solutions, derived queries, JPQL, native SQL, transaction isolation and propagation, Flyway migrations, and database indexing strategies.
Why JPA Exists: The Translation Problem
Imagine you speak English and your friend speaks Spanish. You need a translator in the middle so you can understand each other. That is exactly what JPA does — it translates between your Java objects and your database tables.
Your database thinks in rows and columns. Your Java code thinks in objects and fields. Without JPA, you would write raw SQL, manually pull data out of result sets, and stuff it into objects yourself — over and over again for every single query. JPA automates that entire translation.
What Is an ORM? (Explained Like You Are 10)
Think of a library. The library has a catalog system with cards (that is your database). Each card has info: book title, author, shelf number. Now imagine you want to work with books in your treehouse. You do not want to carry the catalog cards — you want actual book objects you can hold, read, and pass around.
An ORM (Object-Relational Mapping) is the magical librarian who takes catalog cards and turns them into real book objects for you. When you change a book object (say, you write a note in it), the librarian goes back and updates the catalog card too.
- JPA — the specification (the rulebook for how the magical librarian should work)
- Hibernate — the implementation (the actual librarian following those rules)
- Spring Data JPA — the convenience layer (you tell the librarian what you want in plain English instead of detailed instructions)
@Entity: Turning a Table into a Java Class
The @Entity annotation tells JPA: "This Java class maps to a database table." Each instance of the class represents one row. Each field represents one column.
@Entity
@Table(name = "books")
public class Book {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false, length = 200)
private String title;
@Column(name = "page_count")
private int pageCount;
@Column(nullable = false)
private String isbn;
@Column(name = "published_date")
private LocalDate publishedDate;
// Getters and setters omitted for brevity
}
Let us break down the annotations:
@Id— marks the primary key. Every entity must have one.@GeneratedValue(strategy = GenerationType.IDENTITY)— the database auto-generates the ID (like an auto-increment column).@Column— fine-tunes the column: name, nullability, length. If you skip it, JPA uses the field name as the column name.@Table(name = "books")— explicitly sets the table name. Without it, JPA uses the class name.
Relationships: How Tables Connect
In real life, things are connected. An author writes many books. A student enrolls in many courses. An order contains many items. JPA lets you express these connections directly in your Java classes.
@OneToMany and @ManyToOne — The Parent-Child Connection
Think of a classroom. One teacher has many students. From the teacher's side, it is "one to many." From each student's side, it is "many to one" — many students share one teacher.
@Entity
@Table(name = "authors")
public class Author {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
@OneToMany(mappedBy = "author", cascade = CascadeType.ALL, orphanRemoval = true)
private List<Book> books = new ArrayList<>();
// Helper method to keep both sides in sync
public void addBook(Book book) {
books.add(book);
book.setAuthor(this);
}
}
@Entity
@Table(name = "books")
public class Book {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String title;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "author_id", nullable = false)
private Author author;
}
Key points:
mappedBy = "author"— tells JPA that theBookentity owns the relationship (it has the foreign key columnauthor_id).cascade = CascadeType.ALL— when you save/delete an author, all their books get saved/deleted too.orphanRemoval = true— if you remove a book from the author's list, JPA deletes it from the database.fetch = FetchType.LAZY— do not load the author when loading a book unless you actually access it. This is critical for performance.
@ManyToMany — The Network Connection
Students and courses are a perfect example. One student takes many courses. One course has many students. Neither "owns" the other — they share a join table.
@Entity
@Table(name = "students")
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
@ManyToMany
@JoinTable(
name = "student_courses",
joinColumns = @JoinColumn(name = "student_id"),
inverseJoinColumns = @JoinColumn(name = "course_id")
)
private Set<Course> courses = new HashSet<>();
}
@Entity
@Table(name = "courses")
public class Course {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String title;
@ManyToMany(mappedBy = "courses")
private Set<Student> students = new HashSet<>();
}
@JoinTable defines the bridge table. JPA creates student_courses with two foreign key columns. Use Set instead of List for ManyToMany to avoid Hibernate's duplicate-row bug with bag semantics.
The N+1 Problem: Death by a Thousand Queries
This is the most common performance trap in JPA. Let us use the librarian analogy again.
Imagine you ask the librarian: "Give me all 50 authors." The librarian brings you 50 author cards. Then you say: "Now show me the books for each author." The librarian walks to the shelf 50 separate times — once for each author. That is 1 query for authors + 50 queries for books = 51 queries. That is the N+1 problem.
The smart approach? Give the librarian a list and say: "Bring me all authors AND their books in one trip." That is what the solutions below do.
How to Spot It
Enable SQL logging in application.yml:
spring:
jpa:
show-sql: true
properties:
hibernate:
format_sql: true
logging:
level:
org.hibernate.SQL: DEBUG
org.hibernate.type.descriptor.sql.BasicBinder: TRACE
If you see the same SELECT repeating with different IDs, you have an N+1 problem.
Solution 1: JOIN FETCH (JPQL)
Tell JPA to load everything in a single query using a JOIN.
public interface AuthorRepository extends JpaRepository<Author, Long> {
@Query("SELECT a FROM Author a JOIN FETCH a.books WHERE a.id = :id")
Optional<Author> findByIdWithBooks(@Param("id") Long id);
@Query("SELECT DISTINCT a FROM Author a JOIN FETCH a.books")
List<Author> findAllWithBooks();
}
JOIN FETCH tells Hibernate: "When you load authors, also load their books in the same SQL query." The DISTINCT keyword prevents duplicate authors in the result when multiple books exist per author.
Solution 2: @EntityGraph (Declarative)
If you do not want to write JPQL, use @EntityGraph to declaratively specify what to fetch eagerly.
public interface AuthorRepository extends JpaRepository<Author, Long> {
@EntityGraph(attributePaths = {"books"})
List<Author> findAll();
@EntityGraph(attributePaths = {"books"})
Optional<Author> findById(Long id);
}
@EntityGraph overrides the default LAZY fetch for that specific query. It generates a LEFT JOIN behind the scenes. The advantage over JOIN FETCH is you do not need to write any JPQL.
Solution 3: DTO Projections (Best for Read-Only)
Sometimes you do not need the full entity — just specific fields. DTO projections fetch only what you need, skipping the overhead of managed entities entirely.
// DTO class
public record AuthorSummary(String name, long bookCount) {}
// Repository
public interface AuthorRepository extends JpaRepository<Author, Long> {
@Query("SELECT new com.example.dto.AuthorSummary(a.name, COUNT(b)) " +
"FROM Author a LEFT JOIN a.books b GROUP BY a.name")
List<AuthorSummary> findAuthorSummaries();
}
DTO projections are the fastest option because:
- They skip Hibernate's entity tracking (the persistence context).
- They fetch only the columns you specify.
- They are perfect for dashboards, reports, and list views where you display data but do not edit it.
Derived Queries: Spring Reads Your Method Name
Spring Data JPA can generate SQL from your method name. You write the method signature, Spring writes the query. It feels like magic.
public interface BookRepository extends JpaRepository<Book, Long> {
// SELECT * FROM books WHERE title = ?
List<Book> findByTitle(String title);
// SELECT * FROM books WHERE page_count > ?
List<Book> findByPageCountGreaterThan(int pages);
// SELECT * FROM books WHERE title LIKE '%keyword%' AND page_count < ?
List<Book> findByTitleContainingAndPageCountLessThan(String keyword, int maxPages);
// SELECT * FROM books WHERE author_id = ? ORDER BY published_date DESC
List<Book> findByAuthorIdOrderByPublishedDateDesc(Long authorId);
// SELECT COUNT(*) FROM books WHERE author_id = ?
long countByAuthorId(Long authorId);
// SELECT * FROM books WHERE isbn = ?
Optional<Book> findByIsbn(String isbn);
// DELETE FROM books WHERE page_count < ?
void deleteByPageCountLessThan(int pages);
}
Spring parses the method name into parts: findBy + Title + Containing + And + PageCount + LessThan. Each part maps to a SQL clause. Supported keywords include: And, Or, Between, LessThan, GreaterThan, Like, Containing, In, OrderBy, Not, IsNull, and more.
@Query: When Method Names Get Too Long
Derived queries are great for simple lookups. But when queries get complex, method names become unreadable. That is when you use @Query.
JPQL (Java Persistence Query Language)
JPQL looks like SQL but operates on entities and fields instead of tables and columns.
public interface BookRepository extends JpaRepository<Book, Long> {
// JPQL — uses entity names, not table names
@Query("SELECT b FROM Book b WHERE b.author.name = :authorName AND b.pageCount > :minPages")
List<Book> findByAuthorAndMinPages(
@Param("authorName") String authorName,
@Param("minPages") int minPages
);
// Update query
@Modifying
@Query("UPDATE Book b SET b.title = :newTitle WHERE b.id = :bookId")
int updateTitle(@Param("bookId") Long bookId, @Param("newTitle") String newTitle);
}
Native SQL
When you need database-specific features (window functions, CTEs, specific index hints), use native queries.
public interface BookRepository extends JpaRepository<Book, Long> {
// Native SQL — uses actual table/column names
@Query(value = "SELECT b.* FROM books b " +
"JOIN authors a ON b.author_id = a.id " +
"WHERE a.name = :authorName " +
"ORDER BY b.published_date DESC LIMIT 5",
nativeQuery = true)
List<Book> findLatestByAuthor(@Param("authorName") String authorName);
// Window function — not possible in JPQL
@Query(value = "SELECT b.title, b.page_count, " +
"RANK() OVER (ORDER BY b.page_count DESC) as rank " +
"FROM books b",
nativeQuery = true)
List<Object[]> findBooksRankedByLength();
}
When to use which:
- Derived queries — simple lookups with 1-2 conditions
- JPQL — moderate complexity, want database independence
- Native SQL — database-specific features, complex analytics, performance-critical queries
Transactions: All or Nothing
A transaction is like packing a suitcase for a trip. Either everything goes in (commit), or you unpack everything and start over (rollback). You never want to arrive at the airport with half your clothes packed.
In a banking transfer: deduct from account A AND add to account B. If the second step fails, the first must be undone. Transactions guarantee this.
Transaction Isolation Levels
Isolation levels control how much transactions can "see" each other's uncommitted work. Think of it like walls between exam students:
- READ_UNCOMMITTED — no walls. You can see your neighbor's unfinished answers (dirty reads). Almost never used.
- READ_COMMITTED — thin walls. You only see answers that your neighbor has submitted (committed). This is the default for PostgreSQL.
- REPEATABLE_READ — thicker walls. Once you read a value, it will not change during your transaction, even if your neighbor updates it.
- SERIALIZABLE — soundproof rooms. Transactions execute as if they were running one at a time. Safest but slowest.
@Service
public class TransferService {
@Transactional(isolation = Isolation.READ_COMMITTED)
public void transfer(Long fromId, Long toId, BigDecimal amount) {
Account from = accountRepository.findById(fromId)
.orElseThrow(() -> new AccountNotFoundException(fromId));
Account to = accountRepository.findById(toId)
.orElseThrow(() -> new AccountNotFoundException(toId));
if (from.getBalance().compareTo(amount) < 0) {
throw new InsufficientFundsException(fromId, amount);
}
from.setBalance(from.getBalance().subtract(amount));
to.setBalance(to.getBalance().add(amount));
accountRepository.save(from);
accountRepository.save(to);
// If any line throws, EVERYTHING rolls back
}
}
Transaction Propagation
Propagation controls what happens when one transactional method calls another transactional method. Think of it like Russian nesting dolls:
// REQUIRED (default) — join existing transaction or create new one
@Transactional(propagation = Propagation.REQUIRED)
public void placeOrder(Order order) {
orderRepository.save(order);
paymentService.processPayment(order); // joins THIS transaction
// If payment fails, order save is also rolled back
}
// REQUIRES_NEW — always create a brand new transaction
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void logAuditEvent(String event) {
auditRepository.save(new AuditLog(event));
// Even if the calling transaction rolls back,
// this audit log is saved — it has its own transaction
}
// SUPPORTS — use transaction if one exists, run without one if not
@Transactional(propagation = Propagation.SUPPORTS)
public Book findBook(Long id) {
return bookRepository.findById(id).orElse(null);
}
// MANDATORY — must be called within an existing transaction
@Transactional(propagation = Propagation.MANDATORY)
public void deductInventory(Long productId, int quantity) {
// Throws exception if no transaction exists
// Forces callers to wrap this in their own transaction
}
// NEVER — must NOT be called within a transaction
@Transactional(propagation = Propagation.NEVER)
public Report generateReport() {
// Guaranteed no transaction overhead for read-heavy operations
return reportRepository.buildReport();
}
Common pitfall: calling a @Transactional method from within the same class does not create a new transaction. Spring's proxy only intercepts calls from outside the class. If you need to call a transactional method internally, inject the class into itself or extract the method into a separate service.
Flyway Migrations: Version Control for Your Database
Your code lives in Git. Your database schema should too. Flyway tracks which SQL scripts have been applied and runs new ones automatically on startup.
Think of it like a checklist. Flyway marks off each script as it runs. If you deploy a new version with two new scripts, Flyway only runs those two — it never re-runs old ones.
Naming Convention
src/main/resources/db/migration/
V1__create_books_table.sql
V2__create_authors_table.sql
V3__add_author_id_to_books.sql
V4__create_student_courses_table.sql
V5__add_isbn_index.sql
The naming format: V{number}__{description}.sql (two underscores after the version number). Flyway runs them in version order.
Example Migrations
-- V1__create_books_table.sql
CREATE TABLE books (
id BIGSERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
isbn VARCHAR(13) NOT NULL UNIQUE,
page_count INT NOT NULL DEFAULT 0,
published_date DATE,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- V2__create_authors_table.sql
CREATE TABLE authors (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- V3__add_author_id_to_books.sql
ALTER TABLE books ADD COLUMN author_id BIGINT;
ALTER TABLE books ADD CONSTRAINT fk_books_author
FOREIGN KEY (author_id) REFERENCES authors(id);
-- V4__create_student_courses_table.sql
CREATE TABLE students (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
id BIGSERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL
);
CREATE TABLE student_courses (
student_id BIGINT NOT NULL REFERENCES students(id),
course_id BIGINT NOT NULL REFERENCES courses(id),
enrolled_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (student_id, course_id)
);
Golden rules:
- Never edit a migration that has already been applied. Create a new migration to make changes.
- Always test migrations on a copy of production data before deploying.
- Use
Vfor versioned migrations (run once) andRfor repeatable migrations (re-run when changed, useful for views and stored procedures).
Database Indexing: Making Queries Fast
An index is like the index at the back of a textbook. Without it, you flip through every page to find "photosynthesis." With it, you look up "photosynthesis" in the index and jump directly to page 142.
Databases work the same way. Without an index, the database scans every row (a "full table scan"). With the right index, it jumps straight to the matching rows.
B-Tree Index (The Default)
B-tree is the default index type. It works great for equality checks (=), range queries (<, >, BETWEEN), and sorting (ORDER BY).
-- Single column index
CREATE INDEX idx_books_isbn ON books(isbn);
-- Use in JPA entity
@Entity
@Table(name = "books", indexes = {
@Index(name = "idx_books_isbn", columnList = "isbn"),
@Index(name = "idx_books_title", columnList = "title")
})
public class Book {
// ...
}
Composite Index (Multiple Columns)
When you frequently query by multiple columns together, a composite index is far more efficient than separate single-column indexes.
-- Composite index — column order matters!
CREATE INDEX idx_books_author_date ON books(author_id, published_date);
-- This index helps these queries:
-- WHERE author_id = 5 (uses index)
-- WHERE author_id = 5 AND published_date > '2024-01-01' (uses index)
-- WHERE published_date > '2024-01-01' (does NOT use index!)
-- Rule: the index follows the leftmost prefix rule.
-- Think of a phone book sorted by last name, then first name.
-- You can look up "Smith" quickly (last name first).
-- You can look up "Smith, John" quickly (both columns).
-- You CANNOT look up just "John" quickly (skipping the first column).
Partial Index (Conditional Index)
Why index rows you never query? A partial index only covers rows that match a condition, making it smaller and faster.
-- Only index published books (skip drafts)
CREATE INDEX idx_books_published ON books(title)
WHERE status = 'PUBLISHED';
-- Only index active users
CREATE INDEX idx_users_active_email ON users(email)
WHERE is_active = true;
-- Partial indexes are smaller, faster to update, and use less disk space.
-- Perfect when you always filter by a specific condition.
GIN Index (Full-Text Search)
GIN (Generalized Inverted Index) is designed for searching within text, arrays, and JSON. It is the backbone of PostgreSQL full-text search.
-- Full-text search index
CREATE INDEX idx_books_search ON books
USING GIN (to_tsvector('english', title || ' ' || description));
-- Query using the index
SELECT * FROM books
WHERE to_tsvector('english', title || ' ' || description)
@@ to_tsquery('english', 'java & programming');
-- JSON index (for JSONB columns)
CREATE INDEX idx_books_metadata ON books USING GIN (metadata);
-- Query JSON with index support
SELECT * FROM books WHERE metadata @> '{"genre": "fiction"}'::jsonb;
Indexing Best Practices
- Index columns in WHERE, JOIN, and ORDER BY clauses — those are the columns the database searches and sorts.
- Do not over-index — every index slows down INSERT, UPDATE, and DELETE because the index must be updated too.
- Use EXPLAIN ANALYZE — always check whether your query actually uses the index. The database might ignore it if the table is small.
- Monitor index usage — drop indexes nobody uses. They waste disk and slow writes for nothing.
-- Check if your query uses the index
EXPLAIN ANALYZE SELECT * FROM books WHERE isbn = '978-0134685991';
-- Find unused indexes in PostgreSQL
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Putting It All Together
Here is a complete example showing an entity with relationships, a repository with mixed query types, and a transactional service method:
// Entity with relationship and index
@Entity
@Table(name = "orders", indexes = {
@Index(name = "idx_orders_customer", columnList = "customer_id"),
@Index(name = "idx_orders_date", columnList = "orderDate")
})
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "customer_id", nullable = false)
private Customer customer;
@OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
private List<OrderItem> items = new ArrayList<>();
private LocalDateTime orderDate;
private BigDecimal totalAmount;
}
// Repository with derived + JPQL + native queries
public interface OrderRepository extends JpaRepository<Order, Long> {
// Derived query
List<Order> findByCustomerIdOrderByOrderDateDesc(Long customerId);
// JPQL with JOIN FETCH (solves N+1)
@Query("SELECT DISTINCT o FROM Order o JOIN FETCH o.items WHERE o.customer.id = :customerId")
List<Order> findByCustomerWithItems(@Param("customerId") Long customerId);
// DTO projection
@Query("SELECT new com.example.dto.OrderSummary(o.id, o.totalAmount, o.orderDate) " +
"FROM Order o WHERE o.customer.id = :customerId")
List<OrderSummary> findOrderSummaries(@Param("customerId") Long customerId);
}
// Transactional service
@Service
@RequiredArgsConstructor
public class OrderService {
private final OrderRepository orderRepository;
private final InventoryService inventoryService;
@Transactional
public Order placeOrder(Long customerId, List<OrderItemRequest> items) {
Order order = new Order();
order.setCustomer(customerRepository.getReferenceById(customerId));
order.setOrderDate(LocalDateTime.now());
BigDecimal total = BigDecimal.ZERO;
for (OrderItemRequest item : items) {
inventoryService.deductStock(item.productId(), item.quantity());
OrderItem orderItem = new OrderItem(item.productId(), item.quantity(), item.price());
order.getItems().add(orderItem);
orderItem.setOrder(order);
total = total.add(item.price().multiply(BigDecimal.valueOf(item.quantity())));
}
order.setTotalAmount(total);
return orderRepository.save(order);
// If deductStock or save throws, EVERYTHING rolls back
}
}
Frequently Asked Questions
1. Should I use Hibernate's auto DDL generation (hbm2ddl.auto) in production?
Never. Hibernate's auto DDL (spring.jpa.hibernate.ddl-auto=update) is fine for quick prototyping, but it cannot handle complex schema changes like renaming columns, migrating data, or adding partial indexes. In production, always use a migration tool like Flyway or Liquibase. These tools give you version-controlled, repeatable, and auditable schema changes. Think of it this way: you would not deploy code without Git, so do not deploy schema changes without a migration tool.
2. When should I use FetchType.EAGER vs FetchType.LAZY?
Almost always use LAZY. Eager fetching loads related data every time you load the entity, even when you do not need it. If an Author has 500 books and you just want the author's name, EAGER loading fetches all 500 books for nothing. Use LAZY as the default and selectively load relationships when needed using JOIN FETCH, @EntityGraph, or DTO projections. The only exception is @ManyToOne and @OneToOne where the related entity is small and almost always needed.
3. What is the difference between JpaRepository.save() and saveAndFlush()?
save() marks the entity as "to be persisted" but Hibernate decides when to actually send the SQL to the database (usually at transaction commit). saveAndFlush() forces the SQL to be sent immediately. Use saveAndFlush() when you need the database-generated ID right away (for example, to pass it to another service) or when you want to catch a database constraint violation before the transaction ends. For most cases, save() is sufficient and more efficient because Hibernate can batch multiple saves into fewer SQL statements.
4. How do I choose between JPQL and native SQL queries?
Use JPQL when your query works across different databases and does not need database-specific features. JPQL operates on entity names and fields, so it is database-agnostic. Use native SQL when you need features like window functions (RANK, ROW_NUMBER), common table expressions (CTEs), database-specific hints, or when you need maximum performance and want full control over the generated SQL. A good rule: start with derived queries, graduate to JPQL when method names get long, and use native SQL only when JPQL cannot express what you need.
5. How many indexes should I create on a table?
There is no fixed rule, but every index has a cost. Each index speeds up reads but slows down writes because the database must update the index on every INSERT, UPDATE, and DELETE. A common guideline: index columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses in your most frequent queries. Use EXPLAIN ANALYZE to verify your queries actually use the indexes. Monitor index usage with pg_stat_user_indexes and drop indexes with zero scans. For a typical table, 3-5 well-chosen indexes are usually sufficient. Over-indexing is as bad as under-indexing.