Re: New significance of holdable result sets in Java 8

From: Marko Topolnik <marko(dot)topolnik(at)gmail(dot)com>
To: Steven Schlansker <stevenschlansker(at)gmail(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org JDBC" <pgsql-jdbc(at)postgresql(dot)org>, Brian McCallister <brianm(at)skife(dot)org>
Subject: Re: New significance of holdable result sets in Java 8
Date: 2014-11-12 19:20:13
Message-ID: 5855156D-8010-4735-A63D-DC5DF5EABD27@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc


On 12. stu. 2014., at 19:45, Steven Schlansker <stevenschlansker(at)gmail(dot)com> wrote:

> Thanks for bringing this up! I'm not an expert on the PG FE/BE protocol, so please excuse my ignorance below :)
>
> On Nov 12, 2014, at 7:22 AM, Marko Topolnik <marko(dot)topolnik(at)gmail(dot)com> wrote:
>>
>> Data is passed from Controller to View as the return value of a method. Traditionally, if you wanted a collection-shaped response, you would return a List. This meant eager loading of all data needed for the response, which caused scalability issues related to the JVM heap space.
>
> We solved this problem (pre-Streams) by inverting the flow of control. In pseudo-JAX-RS with Jackson:
>
> @Path("/objects")
> public StreamingOutput getObjects() {
> return os -> {
> JsonGenerator jg = new JsonGenerator(os);
> ResultSet rs = findObjects(); // make sure to do this in a txn with fetch size set sensibly or you will collect all rows eagerly
> while (rs.hasNext()) { jg.writeObject(rs.getString("whatever")); }
> jg.close(); rs.close();
> };
> }

I am aware of this approach; I even discuss it as a typical workaround in my blog post [1]. The problem is, this completely dispenses with the detailed and complex setup present in the View layer: the response format can be driven by the Accept HTTP header, encoding may be adapted to the request, and so on. I did pretty much that in my Clojure projects because Clojure's ecosystem isn't as well-developed so roll-your-own solutions are the norm. With Spring, however, you lose a great deal when you give your Controller method access to the raw response.

I have already developed a Stream-based REST service along the lines I discuss in the mentioned blog post and the advantages are quite clear to me. It is a direction that Spring would probably like to move in as well.

[1] http://www.airpair.com/v1/posts/spring-streams-memory-efficiency

>
>> With the Streams API it is now very convenient to return a lazily-evaluated stream of Model objects. It is also very convenient to make this stream pull data directly from an underlying ResultSet, tronsforming each row on-the-fly into a Model object. This, however, calls for holdable result sets because the transaction commits when program control leaves the Service layer.
>
> Having a holdable result set escape the service layer feels like a similar layering violation as having an open transaction escape the service layer. Either way a heavyweight database object escapes and must be cleaned up at a later date. Naïvely I would expect the cost of holding the two objects to be similar, since the actual underlying transaction cannot commit/rollback until all results are read from the cursor?

I do not propose this as a solution that reduces the overall cost and you are right that the cost would just migrate. On the other hand, note that Oracle and MS SQL Server provide holdable cursors by default; in fact, Oracle supports nothing _but_ natively holdable result sets.

>
>> The above could raise the level of interest of the PostgreSQL JDBC team in implementing holdable result sets backed by native holdable cursors instead of the current client-side cursors, which don't allow the space complexity to be reduced from O(n) to O(1) on the JVM side. I am aware that this is not a trivial endeavor as it requires intervention into the FE/BE protocol, but I would nevertheless propose that this concern be reassessed in the light of new developments in the Java ecosystem.
>
> As an alternate thought, Stream is AutoCloseable. What if we introduce a StatementStream, which takes a (Prepared)Statement, executes in a transaction with small fetch size, and then returns a Stream<DTO>. Again very pseudo-codey:
>
> class DAO {
> Stream<DTO> findObjects() {
> return new StatementResultStream(conn, "SELECT * FROM objects"); // sets fetch size, ensure auto commit off
> }
> }
>
> class JsonStreamOutput<T> implements StreamingOutput {
> StreamOutput(Supplier<Stream<T>> streamSupplier) { this.streamSupplier = streamSupplier; }
> public void write(OutputStream os) {
> try (JsonGenerator jg = new JsonGenerator(os);
> Stream<T> str = streamSupplier.get()) {
> str.forEach(obj -> jg.writeObject(obj));
> }
> }
> }
>
> @Path("/objects")
> class Resource {
> @GET
> public StreamOutput<TransformedDTO> getObjects() {
> return new JsonStreamOutput(() -> dto.findObjects().filter(...).map(dto -> new TransformedDto(dto)));
> }
> }

Here I'm not sure I follow you. Do you mean to have the transaction boundaries move from the controller into the View layer? Your StreamOutput-returning controller method would just set up the fetch, then the framework layer above it (which I call the View) would consume that, and only upon consumption would the Statement execute.

You still seem to dictate the output format directly from the Controller, the avoidance of which is motivating my approach.

> The StatementResultStream and StreamOutput classes are entirely reusable. The DAO is no more complicated than before, except instead of using JDBC directly you punt the work until the Stream undergoes a terminal operation. The main difficulty here is figuring out a sensible API for StatementResultStream construction.
>
> The only bit I'm not happy with is the fact that you have to lift your intermediate stream operations into a lambda. You might be able to avoid this by instead writing the whole bit in terms of a Spliterator instead and using StreamSupport.stream(spliterator), but that introduces questions about held resources since Spliterator is not AutoCloseable.

This is easily solved (I already do it for my ScrollableResultsSpliterator): Stream has an onClose hook, so you just introduce a close() method to the spliterator and attach it to the hook.

> This has the advantage that it's implementable today without changes to either JDBC or PGJDBC, and introduces only reusable optional components to e.g. Spring or JDBI or whatever library wishes to do this.
>
> Thoughts?
> Steven
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Marko Topolnik 2014-11-12 19:21:33 Re: New significance of holdable result sets in Java 8
Previous Message Steven Schlansker 2014-11-12 19:19:07 Re: New significance of holdable result sets in Java 8