Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: pgsql-general List <pgsql-general(at)postgresql(dot)org>, <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
Date: 2012-01-09 13:55:50
Message-ID: f03d2d0175ac9ef189bc08893eca7b96@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-jdbc

On Mon, 9 Jan 2012 00:40:08 +0100, Stefan Keller wrote:
> 2012/1/8 Radosław Smogura <rsmogura(at)softperience(dot)eu> wrote:
>> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
>> reference> to LOB. In fact, BLOB behaviour in JDBC> is just thin
>> wrapper for this what is missing in PostgreSQL - BLOBs.
>
> I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too
> but
> I'd like to get the job done with LO too.
> JDBC offers both:
> http://jdbc.postgresql.org/documentation/head/binary-data.html
> I tried to summarize the characteristics here:
> http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_Objects
>
> There's an important principle that the code which "allocates"
> resources is also responsible to release it.
> AFAIK in this case it's JDBC which choses to use LO (which creates
> pg_largeobjects entries) and it's therefore also JDBC which has to
> clean up.
Hmm we have really different point of view. JDBC driver is just
interface which translates JDBC "commands" to database commands. JDBC
driver is not database. So responsible for this is database, but
database doesn't have "real" BLOBs, this what is made in PG JDBC driver
is just "not perfect" way to add this functionality to PostgreSQL. To
make complete system You need PostgreSQL, JDBC driver, and this trigger.

From reasons I mentioned before it's quite hard for JDBC driver to
implement deallocation, because driver will be in need to ask and to
"know" for OIDs to unlink. This could made from driver database which
need to parse queries, manages db structure etc, and this will be
executed twice in driver and in database - this is ineffective. Just for
example driver may support this for "DELETE FROM", but what will be if
deletion will be result of calling stored procedure? What if stored
procedure will be created in language different then pgSQL (driver still
may try to download it), but if this will be C, or any other? JDBC
driver must be universal and fully "compatible" with PG functionality,
as it may be used only as "helper" for presentation layer, true
processing logic may be somewhere e. g. legacy system for which we add
web interface.

Because of this JDBC driver can't expose desired behavior, even if
everyone agrees that current behavior is unperfect.

Only PG server may do this, or Hibernate for example by new annotation
@MangedPgLob.

Regards,
Radek

> Yours, Stefan
>
> P.S. Just in order get some more insight I also tried to forward this
> to the JDBC list (which currently seems to have problems accepting
> new
> subscriptions).
>
> 2012/1/8 Radosław Smogura <rsmogura(at)softperience(dot)eu>:
>> On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
>>>
>>> Thanks, Radosław, for the clarification.
>>>
>>> 2012/1/8 Radosław Smogura <rsmogura(at)softperience(dot)eu> wrote:
>>>>
>>>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>>>
>>>> 4. Trigger based approach is good for this, but You need to do
>>>> this
>>>> mannualy
>>>
>>>
>>> That's exactly my point:
>>> Since JDBC manages creation of OID and pg_largeobjects it's also
>>> JDBC
>>> which is responsible for update/delete of rows in pg_largeobjects.
>>> As the user expects, the only thing JDBC would have to do is to
>>> issue
>>> lo_unlink() when rows are updated or deleted.
>>> Thus, it's currently a bug in the JDBC driver. And whatever the
>>> solution is, it needs to be mentioned in the JDBC docs.
>>
>> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
>> reference
>> to LOB. In fact, BLOB behaviour in JDBC
>> is just thin wrapper for this what is missing in PostgreSQL - BLOBs.
>> It was
>> build form available parts. In addition OID type may be used and
>> it's used
>> as the system id or may be used as just some kind of row id - all
>> types,
>> tables, sequences etc has OID. You may create table with "WITH OIDS"
>> clause.
>> You may use OID data type just as replacement for (unsigned) int, so
>> JDBC
>> can create LOB but it can't decide if given field in row is
>> reference to LOB
>> or e.g. table and what with statements "DELETE WHERE date > "?
>>
>> In fact JDBC driver is so "stupid" that if you will call getBytes or
>> getBlob
>> on any column with Oid it will ask for LOB.
>>
>> Hibernate knows this and it does what is best - calls standard BLOB
>> interface, and creates table with Oid column.
>>
>> And here again round trip, in case of deletion only Hibernate may
>> delete
>> given LOB because only Hibernate and You knows that any value in Oid
>> column
>> will reflect LOB - JDBC driver doesn't "knows " this, but... oids
>> may be
>> shared because those are only numbers, not all tables may be covered
>> by
>> Hibernate, and assuming huge imagination, someone may encode OID by
>> adding
>> 1, may store it as long, etc. I know it's quite popular that DB
>> schema comes
>> from entities, but not always You have such flexibility and
>> sometimes You
>> create Entities for particular schema.
>>
>> So, as You see only this trigger approach is currently (universally)
>> the
>> best way. If this is that module I think about it's just trigger
>> which calls
>> unlink on replace or deletion - in fact You may write own without
>> any
>> problems. Those are few lines only.
>>
>>
>>>> 5. If you want to use bytea use
>>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I
>>>> think you
>>>> should remove @Lob too) on your field.
>>>
>>>
>>> In fact, this annotation syntax usage looks like the Hibernate
>>> mapping
>>> for PostgreSQL could be enhanced.
>>
>> You have right, but Hibernate team will be in need to add auto
>> deletion for
>> bulk updates, too. PostgreSQL isn't still so popular it's worth of
>> it (and
>> they have no so much developers bug report may stand for months
>> without any
>> comment). Look how many peoples ask for Lobs. It means no one needs
>> true LOB
>> - true LOB large object stored outside table/row data space, to
>> allow out of
>> statements operation like streaming, partial updates etc. This is my
>> definition of LOB, because this is idea of LOB. I think only DB2
>> stores LOBs
>> in row, PostgreSQL do not make this but adverts this bytea ugly
>> approach.
>>
>> Personally, I create app which captures images from WebCam (like
>> You) - just
>> frames not movies. From above reason I wanted to move to bytea, but
>> due to
>> changes (wired instability and leak of backward compatibility) I
>> still have
>> Oid. Because I have only two tables for Lobs I have garbage
>> collection
>> simple script.
>>
>> Best regards
>> Radek
>>
>>
>>> Yours, Stefan
>>>
>>> 2012/1/8 Radosław Smogura <rsmogura(at)softperience(dot)eu>:
>>>>
>>>> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
>>>>>
>>>>>
>>>>> I'd like to backup my statement below regarding in JDBC driver
>>>>> from
>>>>> PostgreSQL:
>>>>>
>>>>> When storing fields of type BLOB it inserts the binary string in
>>>>> system table pg_largeobject (via user table). But when rows in
>>>>> user
>>>>> table get updated or deleted it does not update nor delete
>>>>> corresponding rows in table pg_largeobject.
>>>>>
>>>>> That's really a bug!
>>>>>
>>>>> Fortunately there's a solution indicated in the official docs
>>>>> (http://www.postgresql.org/docs/9.1/static/lo.html)
>>>>> To me, something like this should be implemented before hand in
>>>>> JDBC
>>>>> driver.
>>>>> And in any case there should be a bold note about this in the
>>>>> JDBC
>>>>> docs
>>>>> (http://jdbc.postgresql.org/documentation/head/binary-data.html )
>>>>>
>>>>> Yours, Stefan
>>>>>
>>>>>
>>>>>
>>>>> 2012/1/6 Stefan Keller <sfkeller(at)gmail(dot)com>:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I run into a nasty behavior of current PostgreSQL JDBC.
>>>>>>
>>>>>> I maintain images (from Webcams). In the Java and Hibernate
>>>>>> (JPA) code
>>>>>> I specified a @Lob annotation on class MyData and a
>>>>>> attribte/data type
>>>>>> "byte[] mydata;". Hibernate then generates two tables in
>>>>>> PostgreSQL,
>>>>>> one called MyData with a column mydata of type oid and an
>>>>>> internal one
>>>>>> called pg_largobjects (which contain foreign keys to the oid).
>>>>>> That's
>>>>>> also explained in the JDBC docs [1], saying "PostgreSQL provides
>>>>>> two
>>>>>> distinct ways to store binary data. Binary data can be stored in
>>>>>> a
>>>>>> table using the data type bytea or by using the Large Object
>>>>>> feature
>>>>>> which stores the binary data in a separate table in a special
>>>>>> format
>>>>>> and refers to that table by storing a value of type oid in your
>>>>>> table."
>>>>>>
>>>>>> Now, when replacing the images (few hundred) with new ones using
>>>>>> Java,
>>>>>> pg_largeobjects grew constantly until the file system run out of
>>>>>> space. So old image data did'nt get released! This is to me a
>>>>>> bug
>>>>>> because the user/programmer must (and should) assume that there
>>>>>> is a
>>>>>> strict 1:1 relationship between generated table MyData and its
>>>>>> LOB
>>>>>> column data (stored in pg_largeobjects).
>>>>>> => I finally found the supplied module 'lo' [2] which releases
>>>>>> detached records. Is this the recommended way to resolve this
>>>>>> problem?
>>>>>>
>>>>>> Searching for explanations I found a ticket HHH2244 [3] which
>>>>>> was
>>>>>> closed by the Hibernate team without action referring to the
>>>>>> JDBC
>>>>>> Spec. which says: "An SQL BLOB is a built-in type that stores a
>>>>>> Binary
>>>>>> Large Object as a column value in a row of a database table".
>>>>>> => In other words: The PostgreSQL JDBC team should take action
>>>>>> on this
>>>>>> but didn't until now, right?
>>>>>>
>>>>>> There is another issue about "PostgreSQL and BLOBs" [4]. First
>>>>>> it
>>>>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too
>>>>>> in a
>>>>>> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5)
>>>>>> needs
>>>>>> to change not to use MaterializedBlobType until the Postgres
>>>>>> (JDBC)
>>>>>> team changes their driver (which does not seem to have happened
>>>>>> in the
>>>>>> last 6 years)."
>>>>>> => Any solutions or comments form insiders on this?
>>>>>>
>>>>>> Yours, Stefan
>>>>>>
>>>>>> [1]
>>>>>> http://jdbc.postgresql.org/documentation/head/binary-data.html
>>>>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html
>>>>>> [3] https://hibernate.onjira.com/browse/HHH-2244
>>>>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
>>>>
>>>>
>>>> This is common approach for PostgreSQL and some proxy of real life
>>>> Blobs.
>>>> One time I submitted bug about this to Hibernate. But step by
>>>> step.
>>>>
>>>> 1. BLOBs are... Large Objects, they are stored as reference
>>>> because those
>>>> objects are large, if you will store this objects as bytea then
>>>> select *
>>>> will return all large data. It may not be comfortable not only to
>>>> download
>>>> few GB of data, but to keep this on stack too. From your
>>>> perspective it
>>>> doesn't matters because you put it in byte[]. But if You will keep
>>>> e.g.
>>>> CD-ROM images then it's much more better to use streaming approach
>>>> then
>>>> bytea[]. More over due to some security JDBC driver will at least
>>>> double
>>>> memory consumed by bytea.
>>>>
>>>> 2. Specifying hibernate data type as bytea do not resolve problems
>>>> because
>>>> it will still use LOB approach.
>>>>
>>>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>>>
>>>> 4. Trigger based approach is good for this, but You need to do
>>>> this
>>>> mannualy
>>>>
>>>> 5. If you want to use bytea use
>>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I
>>>> think you
>>>> should remove @Lob too) on your field.
>>>>
>>>> Regards,
>>>> Radosław Smogura
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Radosław Smogura 2012-01-09 14:02:30 Re: Vacuum and Large Objects
Previous Message Alban Hertroys 2012-01-09 12:26:46 Re: Supporting SQL/MED DATALINK

Browse pgsql-jdbc by date

  From Date Subject
Next Message Alban Hertroys 2012-01-09 14:52:23 Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
Previous Message Dave Cramer 2012-01-09 12:08:52 Re: problem: query result in jdbc is <> result in psql