Re: very slow largeobject transfers through JDBC

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Mate Varga <m(at)matevarga(dot)net>
Cc: Dmitry Igrishin <dmitigr(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: very slow largeobject transfers through JDBC
Date: 2018-09-06 09:04:44
Message-ID: CADK3HHLtXNuJgB-x71SL2VjBTzfwajYb1cppxa5JRp4kzm40TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Mate,

Thanks for the detailed response. This will help others in the same
situation

Dave Cramer

davec(at)postgresintl(dot)com
www.postgresintl.com

On Thu, 6 Sep 2018 at 05:03, Mate Varga <m(at)matevarga(dot)net> wrote:

> Hi,
>
> summarizing:
> we had a table that had an OID column, referencing an object in
> pg_largeobject. This was mapped to a (Java) entity with a byte array field,
> annotated with @Lob. The problem was that we were fetching thousands of
> these entities in one go, and LOB fetching is not batched by Hibernate/JDBC
> (so each row is fetched separately). Because we were abusing LOBs (they
> were small, often less than 10 kB), we have chosen to move the binary blobs
> from the LO table to a simple bytea column. So the entity that had a byte
> array field mapped to an OID column now has a byte array field mapped to a
> bytea column, and we have manually moved data from the LO table to the
> bytea column. Now Hibernate/JDBC fetches all the content we need in
> batches. Random benchmark: fetching 20k rows used to take 7 seconds (250
> msec query execution time, 6.7 sec for transfer) and now it takes 1.5
> seconds (250 msec query + 1.3 sec transfer).
>
> Regards,
> Mate
>
> On Thu, Sep 6, 2018 at 10:56 AM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>
>> Hi
>>
>> Can you be more explicit how you fixed the problem ?
>>
>> Thanks
>> Dave Cramer
>>
>> davec(at)postgresintl(dot)com
>> www.postgresintl.com
>>
>>
>> On Thu, 6 Sep 2018 at 03:46, Mate Varga <m(at)matevarga(dot)net> wrote:
>>
>>> After inlining the data, performance issues have been solved. Thanks for
>>> the help.
>>>
>>> On Mon, Sep 3, 2018 at 9:57 PM Mate Varga <m(at)matevarga(dot)net> wrote:
>>>
>>>> Thanks,
>>>> 1) we'll try to move stuff out from LOBs
>>>> 2) we might raise a PR for the JDBC driver
>>>>
>>>> Mate
>>>>
>>>> On Mon, 3 Sep 2018, 19:35 Dave Cramer, <pg(at)fastcrypt(dot)com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, 3 Sep 2018 at 13:00, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>
>>>>>> More precisely: when fetching 10k rows, JDBC driver just does a large
>>>>>> bunch of socket reads. With lobs, it's ping-pong: one read, one write per
>>>>>> lob...
>>>>>>
>>>>>>
>>>>> Ok, this is making more sense. In theory we could fetch them all but
>>>>> since they are LOB's we could run out of memory.
>>>>>
>>>>> Not sure what to tell you at this point. I'd entertain a PR if you
>>>>> were motivated.
>>>>>
>>>>> Dave Cramer
>>>>>
>>>>> davec(at)postgresintl(dot)com
>>>>> www.postgresintl.com
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On Mon, Sep 3, 2018 at 6:54 PM Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>
>>>>>>> So I have detailed profiling results now. Basically it takes very
>>>>>>> long that for each blob, the JDBC driver reads from the socket then it
>>>>>>> creates the byte array on the Java side. Then it reads the next blob, etc.
>>>>>>> I guess this takes many network roundtrips.
>>>>>>>
>>>>>>> On Mon, Sep 3, 2018 at 5:58 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 3 Sep 2018 at 10:48, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>>>
>>>>>>>>> That's 1690 msec (1.69 seconds, and that is how long it takes to
>>>>>>>>> fetch 20k (small-ish) rows without LOBs (LOBs are a few lines below on the
>>>>>>>>> screenshot)
>>>>>>>>>
>>>>>>>>
>>>>>>>> that sound high as well!
>>>>>>>>
>>>>>>>> Something isn't adding up..
>>>>>>>>
>>>>>>>>
>>>>>>>> Dave Cramer
>>>>>>>>
>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>> www.postgresintl.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 3, 2018 at 4:40 PM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> the one you have highlighted ~1.69ms
>>>>>>>>>>
>>>>>>>>>> Dave Cramer
>>>>>>>>>>
>>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>>> www.postgresintl.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 3 Sep 2018 at 10:38, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>>>>>
>>>>>>>>>>> Which frame do you refer to?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 3, 2018 at 3:57 PM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Not sure why reading from a socket is taking 1ms ?
>>>>>>>>>>>>
>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>
>>>>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>>>>> www.postgresintl.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 3 Sep 2018 at 09:39, Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://imgur.com/a/ovsJPRv -- I've uploaded the profiling
>>>>>>>>>>>>> info (as an image, sorry). It seems this is a JDBC-level problem. I
>>>>>>>>>>>>> understand that the absolute timing is not meaningful at all because you
>>>>>>>>>>>>> don't know how large the resultset is, but I can tell that this is only a
>>>>>>>>>>>>> few thousands rows + few thousand largeobjects, each largeobject is around
>>>>>>>>>>>>> 1 kByte. (Yes I know this is not a proper use of LOBs -- it's a legacy db
>>>>>>>>>>>>> structure that's hard to change.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Mate
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:52 AM Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we'll try to test this with pure JDBC versus hibernate.
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:48 AM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 3 Sep 2018 at 03:55, Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Basically there's a class with a byte[] field, the class is
>>>>>>>>>>>>>>>> mapped to table T and the byte field is annotated with @Lob so it goes to
>>>>>>>>>>>>>>>> the pg_largeobject table.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ah, so hibernate is in the mix. I wonder if that is causing
>>>>>>>>>>>>>>> some challenges ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The DB is on separate host but relatively close to the app,
>>>>>>>>>>>>>>>> and I can reproduce the problem locally as well. One interesting bit is
>>>>>>>>>>>>>>>> that turning of SSL between the app and PSQL speeds up things by at least
>>>>>>>>>>>>>>>> 50%.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ah, one addition -- the binary objects are encrypted, so
>>>>>>>>>>>>>>>> their entropy is very high.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any chance you could write a simple non-hibernate test code
>>>>>>>>>>>>>>> to time the code ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> dave(dot)cramer(at)crunchydata(dot)ca
>>>>>>>>>>>>>>> www.crunchydata.ca
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mate
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Sep 2, 2018 at 12:55 AM Dave Cramer <
>>>>>>>>>>>>>>>> pg(at)fastcrypt(dot)com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, 31 Aug 2018 at 10:15, Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I see -- we could try that, though we're mostly using an
>>>>>>>>>>>>>>>>>> ORM (Hibernate) to do this. Thanks!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Aug 31, 2018 at 3:57 PM Dmitry Igrishin <
>>>>>>>>>>>>>>>>>> dmitigr(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> пт, 31 авг. 2018 г. в 16:35, Mate Varga <m(at)matevarga(dot)net
>>>>>>>>>>>>>>>>>>> >:
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > we're fetching binary data from pg_largeobject table.
>>>>>>>>>>>>>>>>>>> The data is not very large, but we ended up storing it there. If I'm
>>>>>>>>>>>>>>>>>>> copying the data to a file from the psql console, then it takes X time
>>>>>>>>>>>>>>>>>>> (e.g. a second), fetching it through the JDBC driver takes at least 10x
>>>>>>>>>>>>>>>>>>> more. We don't see this difference between JDBC and 'native' performance
>>>>>>>>>>>>>>>>>>> for anything except largeobjects (and bytea columns, for the record).
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Does anyone have any advice about whether this can be
>>>>>>>>>>>>>>>>>>> tuned or what the cause is?
>>>>>>>>>>>>>>>>>>> I don't know what a reason of that, but I think it's
>>>>>>>>>>>>>>>>>>> reasonable and
>>>>>>>>>>>>>>>>>>> quite simple to call lo_import()/lo_export() via JNI.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can't imagine that's any faster. The driver simply
>>>>>>>>>>>>>>>>> implements the protocol
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you have any code to share ? Any other information ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is the JDBC connection significantly further away network
>>>>>>>>>>>>>>>>> wise ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>>>>>>>>>> www.postgresintl.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2018-09-06 13:27:14 Re: timestamp arithmetics in C function
Previous Message Mate Varga 2018-09-06 09:03:02 Re: very slow largeobject transfers through JDBC