Reading and writing off-heap data

From: Tom Dunstan <pgsql(at)tomd(dot)cc>
To: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Reading and writing off-heap data
Date: 2017-09-21 05:43:08
Message-ID: CAPPfruxfBLM=gnyW-y1-ioPvZ+3i70_XARk0eX+HBdEtk7YPjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi all

After original discussion back in March[1] we've finally gotten around to
scheduling this work. I've submitted a pull-request to support writing data
from off-heap locations using the discussed interface here [2].

We'd like to do the flip of this too, though: read incoming data into a
caller-controlled buffer in some way. The interface would look something
like this:

// provided by driver
interface ByteStreamReader<T implements Closeable> {
T readByteStream(int length, InputStream stream) throws IOException;
}

// user code
class MyCustomByteStreamReader implements ByteStreamReader<MyBufferHandle> {
...
}

preparedStatement.registerByteStreamReader(new MyCustomByteStreamReader());
...
MyBufferHandle b = (MyBufferHandle) resultSet.getObject(2);

There are a couple of issues:

1. Internally, the driver passes incoming tuples around as byte[][]
instances, which doesn't leave much ability to do something else with the
incoming data. I've submitted a PR [3] that introduces a Tuple class as a
wrapper to pass around, which then allows us to do more interesting things
with the data.

2. How should we register the reader? We have to do it ahead of execution
of the query, as the driver has already read at least some data rows by the
time we return the ResultSet.
Some potential options are:

a) Register against the statement and use it for all columns of binary
type. This would look like the above.

b) Register against the statement but for individual columns:
statement.registerByteStreamReader(2, new MyCustomByteStreamReader());
statement.registerByteStreamReader("foo", new MyCustomByteStreamReader());

c) Mark incoming columns in some other way that the driver can recognise.
This requires getting creative. An example would be to create a domain over
the bytea type and then register the reader for that type. Then queries
would have to have results cast to that type.

d) Register a higher-level object like the connection or driver and use it
for all columns of binary type.

Option a) is the simplest in that it neither requires us to keep track of
readers for individual columns nor requires users having to mess with their
database schema. It's definitely enough for my use-case, but I'm interested
in hearing other opinions on whether that's flexible enough.

Is there general support for the feature? I'm again happy to code up a PR
and have time allocated to do that fairly soon if there's likelihood of it
being merged.

Thanks

Tom

[1]
https://www.postgresql.org/message-id/C659D6A4-430F-4F55-BE06-BE1C960A5405%40tomd.cc
[2] https://github.com/pgjdbc/pgjdbc/pull/953
[3] https://github.com/pgjdbc/pgjdbc/pull/954

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Yason TR 2017-09-21 13:50:44 JDBC: logical replication and LSN feedback
Previous Message Yason TR 2017-09-20 13:01:19 Re: [GENERAL] JDBC: logical replication and LSN feedback