Spring Boot: Handling a REST Endpoint That Queries More Data Than Memory Available

If you are a developer and have not faced this issue yet, it is still worth reading, as at some point in your career, you will likely need…

Renan Schmitt

JavaJams

· ~5 min read · June 24, 2024 (Updated: June 25, 2024) · Free: No

If you are a developer and have not faced this issue yet, it is still worth reading, as at some point in your career, you will likely need to create a Spring Boot REST endpoint that performs a database query with results that do not fit into memory.

In this article, let's delve into an example of a REST endpoint that cannot be implemented in the traditional way due to memory consumption.

Scenario

In this exercise, let's use a simple scenario involving Customer, Order, OrderItem, and Product:

Our goal is to create an endpoint that will generate a report. This endpoint will query and return:

One million Orders.
More than 5 million OrderItems.

Traditional Implementation

Let's define a DTO with some fields:

@Data
public class ReportDto {
  private final Long orderId;
  private final LocalDate date;
  private final String customerName;
  . . .
  private final List<Item> items;

  @Data
  public static class Item {
    private final Long productId;
    private final String productName;
    private final Integer quantity;
    . . .
  }
}

The repository is a CrudRepository for the Order entity, which allows us to retrieve all other data through JPA relationships. For simplicity, we will use the findAll method to return the data.

@Repository
public interface OrderRepository extends CrudRepository<Order, Long> {
}

The service class performs the following actions:

Creates an ArrayList to hold the result of the method.
Calls the findAll method from the repository to retrieve the order data.
Loops through the query results and maps them to the DTO.

@Service
@RequiredArgsConstructor
public class ReportService {
  private final OrderRepository orderRepository;

  public List<ReportDto> getResult() {
    var result = new ArrayList<ReportDto>();
    for (var order : orderRepository.findAll()) {
      result.add(mapToOrder(order));
    }

    return result;
  }
}

The controller simply calls the service and returns its results.

@Controller
@RestController
@RequiredArgsConstructor
public class ReportController {

  private final ReportService reportService;

  @GetMapping("/v1/report")
  public ResponseEntity<List<ReportDto>> report() {
    var result = reportService.getResult();
    return ResponseEntity.ok(result);
  }
}

When using curl to test the endpoint, we encountered the following error after 45 minutes!

curl -w "\n" -X GET http://localhost:8000/v1/report
{"timestamp":"2024-06-21T19:50:05.720+00:00","status":500,"error":"Internal Server Error","path":"/v1/report"}

Upon checking the service output, we found the following log:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "http-nio-8000-Poller"
Exception in thread "mysql-cj-abandoned-connection-cleanup" java.lang.OutOfMemoryError: Java heap space

Our implementation failed to query the data from the database because the query result was larger than the available memory.

Solving Querying

Our first step is to improve the querying process to handle the large data volume efficiently.

First, let's define a method in the repository that returns a Stream instead of a List or Iterable. When using a Stream as the return type, the data is not fetched from the database all at once. Instead, it is returned in chunks as we consume the stream.

@Repository
public interface OrderRepository extends CrudRepository<Order, Long> {
  Stream<Order> findAllBy();
}

The service class needs to be modified:

Since the repository returns a stream and the data will be fetched on demand from the database, we need to keep the transaction open for the entire execution. We only need a read-only transaction, which we achieve using the @Transactional(readOnly = true) annotation.
As we are dealing with a stream that retrieves data from the database, we need to ensure that we close the stream properly. To do this, we use the try-with-resources statement.

To ensure that JPA does not keep the entity in memory after processing it, we manually detach it using the EntityManager.

@Service
@RequiredArgsConstructor
public class ReportService {
  private final OrderRepository orderRepository;

  @Transactional(readOnly = true)
  public List<ReportDto> getResult2() {
    var result = new ArrayList<ReportDto>();
    try (var orderStream = orderRepository.findAllBy()) {
      orderStream.forEach(
          order -> {
            result.add(mapToOrder(order));
            entityManager.detach(order);
          });
    }

    return result;
  }
}

The controller remains the same, but it now refers to version 2 of the API. As a result, we finally get some output:

curl -w "\n" -X GET http://localhost:8000/v2/report

[
  {
    "orderId":1,
    "date":"2022-08-25",
    "customerName":"Booker",
    "totalAmount":19104.36,
    "currency":"CDF",
    "status":"Shipped",
    "paymentMethod":"Credit Card",
    "items": [
      {
        "productId":93,
        "productName":"Rustic Bronze Bag",
        "quantity":41,
        "price":465.96,
        "totalAmount":19104.36
      }
    ]
  },
  {
    "orderId":2,
    "date":"2022-03-29",
    "customerName":"Danielle",
    "totalAmount":14685.35,
    "currency":"MUR",
    "status":"Processing",
    "paymentMethod":"Credit Card",
    "items": [
      {
        "productId":52,
        "productName":"Mediocre Copper Bench",
        "quantity":98,
        "price":46.02,
        "totalAmount":4509.96
      },
      {
        "productId":71,
        "productName":"Fantastic Bronze Hat",
        "quantity":31,
        "price":233.61,
        "totalAmount":7241.91
      },
      {
        "productId":3,
        "productName":"Mediocre Silk Bottle",
        "quantity":22,
        "price":133.34,
        "totalAmount":2933.48
      }
    ]
  },
  …
]

We solved the JPA memory issue; however, it took 42 minutes to return the result, and I am sure we can do better.

Streaming the Result

The time taken to process and return the results to the caller is too long because Java has to handle a large amount of data, and the internal data structures lose performance as they grow larger.

The solution is to use a stream to return the data. For the caller, it works similarly to downloading a file, where the results are sent in chunks by the server.

The controller now returns a StreamingResponseBody:

@GetMapping("/v3/report")
public ResponseEntity<StreamingResponseBody> report3() {
  var body = reportService.getResult();
  return ResponseEntity.ok(body);
}

The service class requires some changes:

Since we are using a stream to return the data, we need to manually control the transaction using the TransactionTemplate. To instantiate it, we require the PlatformTransactionManager, which is passed in the constructor.
We use the transaction template to encapsulate the core execution, performed by the fillStream method.

The fillStream method utilizes an ObjectMapper to convert the result to JSON. For each order retrieved from the database, it maps to the DTO, converts it to JSON, and writes it to the StreamingResponseBody.

@Service
public class ReportService {
  private final TransactionTemplate transactionTemplate;
  private final OrderRepository orderRepository;
  private final EntityManager entityManager;
  private final ObjectMapper objectMapper = new ObjectMapper();

  public ReportService(
      PlatformTransactionManager platformTransactionManager,
      OrderRepository orderRepository,
      EntityManager entityManager) {
    this.transactionTemplate = new TransactionTemplate(platformTransactionManager);
    this.orderRepository = orderRepository;
    this.entityManager = entityManager;

    objectMapper.registerModule(new JavaTimeModule());
  }

  public StreamingResponseBody getResult() {
    return outputStream ->
        transactionTemplate.execute(
            new TransactionCallbackWithoutResult() {
              @Override
              protected void doInTransactionWithoutResult(TransactionStatus status) {
                fillStream(outputStream);
              }
            });
  }

  private void fillStream(OutputStream outputStream) {
    try (var orderStream = orderRepository.findAllBy()) {
      orderStream.forEach(
          order -> {
            try {
              var json = objectMapper.writeValueAsString(mapToOrder(order));
              outputStream.write(json.getBytes(StandardCharsets.UTF_8));
              entityManager.detach(order);

            } catch (IOException e) {
              throw new RuntimeException(e);
            }
          });
    }
  }

}

With this change in place, a few seconds after calling the endpoints, we begin to receive the response. Since the result is being streamed, the memory used by Java to handle it becomes insignificant, significantly enhancing the performance of this execution. In fact, the performance improvement was so substantial that the execution time decreased from 42 minutes to just 30 seconds!

Conclusion

We can further enhance this code by optimizing the query itself, such as using a specific query that returns the result directly in DTO format, aiming to reduce the number of database queries.

The complete code for this exercise is available at: https://github.com/renan-schmitt/big-queries.

#java #spring-boot #jpa #streaming #programming