#StackBounty: #java #spring #postgresql #hibernate #stream-processing efficiently store a result stream in multiple tables with optimis…

Bounty: 100

Given a result stream with a lot of items I want to store them and handle potential concurrency conflicts:

public void onTriggerEvent(/* params */) {
  Stream<Result> results = customThreadPool.submit(/*...complex parallel computation on multiple servers...*/).get();
  List<Result> conflicts = store(results);
  resolveConflictsInNewTransaction(conflicts);
}

I am stuck on how to approach implementing store(...) efficiently. The Result consists of two immutable and detached objects describing data that needs to be updated in their respective DB tables.

@Value
public static class Result {
    A a; // describes update for row in table a
    B b; // describes update for row in table b
}

A and B each reference two users, where (u1, u2) is a key on the respective DB table.

@Value
public static class A {
    long u1;
    long u2;
   // ... computed data fields ...
}
// B accordingly

The stream-calculation itself might be triggered concurrently (multiple onTriggerEvent invocations in parallel) which is mostly fine, but sometimes might result in conflicts for some results (about 0,1% is in conflict, e.g. a stream has a result for (53,21) and another invocation also updated (53,21) in the meantime). The conflict of A and/or B is indicated by their updatedAt fields that would be different in comparison to the beginning of the operation. Here, of course, we do not want to throw away all results and just try again, but only want to resolve the rows in conflict.

So I wonder what is a good approach to (1) store all Result.a and Result.b that are not in conflict and (2) get a List of Results that are in conflict and need special treatment.

public List<Result> store(Stream<Result> results) {
  // store all a
  // store all b (ideally without using results * 2 RAM)
  // do update other stuff if a and b are not in conflict and do it in the same ACID transaction as the update of the related a and b.
  // return those in Conflict
}

How can I implement it without unpacking each result, sending it to the db in its own transaction etc? Ideally, I need send all at once to the DB and get a list of conflicts that have not been stored (and the other should have been persisted). I am open to a different approach as well.

We use JPA/Hibernate if that is relevant.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.