Re: BLOB support

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BLOB support
Date: 2011-06-02 16:53:52
Message-ID: 201106021853.52943.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Thursday 02 of June 2011 16:42:42
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > But these problems can be fixed without inventing a completely new
> > system, I think. Or at least we should try. I can see the point of a
> > data type that is really a pointer to a LOB, and the LOB gets deleted
> > when the pointer is removed, but I don't think that should require
> > far-reaching changes all over the system (like relhaslobs) to make it
> > work efficiently. I think you need to start with a problem statement,
> > get agreement that it is a problem and on what the solution should be,
> > and then go write the code to implement that solution.
>
> Yes. I think the appropriate problem statement is "provide streaming
> access to large field values, as an alternative to just fetching/storing
> the entire value at once". I see no good reason to import the entire
> messy notion of LOBS/CLOBS. (The fact that other databases have done it
> is not a good reason.)
>
> For primitive types like text or bytea it seems pretty obvious what
> "streaming access" should entail, but it might be interesting to
> consider what it should mean for structured types. For instance, if I
> have an array field with umpteen zillion elements, it might be nice to
> fetch them one at a time using the streaming access mechanism. I don't
> say that that has to be in the first version, but it'd be a good idea to
> keep that in the back of your head so you don't design a dead-end
> solution that can't be extended in that direction.
>
> regards, tom lane

In context of LOBs streaming is resolved... I use current LO functionallity
(so driver may be able to read LOBs as psql \lo_export does it or using COPY
subprotocol) and client should get just LO's id. BLOBs in this implementation,
like Robert wanted are just wrapper for core LO, with some extensions for
special situations.... Adding of relhaslob in this impl is quite importnat to
do not examine tupledesc for each table operation, but this value may be
deduced during relation open (with performance penatly). I saw simillar is
made few lines above when triggers are fired, and few lines below when indices
are updated.

Currently BLOBs may be emulated using core LO (JDBC driver does it), but among
everything else, other problems are, if you look from point of view of
application developing:

1. No tracking of unused LO (you store just id of such object). You may leak
LO after row remove/update. User may write triggers for this, but it is not
argument - BLOB type is popular, and it's simplicity of use is quite
important. When I create app this is worst thing.

2. No support for casting in UPDATE/INSERT. So there is no way to simple
migrate data (e.g. from too long varchars). Or to copy BLOBs.

3. Limitation of field size to 1GB.

Other solution, I was think about, is to introduce system triggers (such
triggers can't be disabled or removed). So there will be new flag in triggers
table.

Now I think, we should try to mix both aproches, as system triggers may give
interesting API for other developers.

Other databases (may) store LOBs, Arrays, and Composites in external tables,
so user get's just id of such object.

I think about two weaks about streaming, I have some concepts about this, but
from point of view of memory consumption and performance. I will send concept
later, I want to think a little bit about it once more, and search what can be
actually done.

Regards,
Radek

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-06-02 16:54:59 Re: Hacking gram.y Error syntax error at or near "MERGEJOIN"
Previous Message Robert Haas 2011-06-02 16:53:15 Re: Re: patch review : Add ability to constrain backend temporary file space