Re: Bulk Inserts

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulk Inserts
Date: 2009-09-15 02:33:00
Message-ID: f67928030909141933o525b355au4a5bbd1a20faa59b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/9/14 Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>

>
> Replying to myself...
>
> Jeff suggested to build pages in local memory and insert them later in the
> table. This is what's used in CLUSTER for instance, I believe.
>
> It has some drawbacks though :
>
> - To insert the tuples in indexes, the tuples need tids, but if you build
> the page in local memory, you don't know on which page they will be until
> after allocating the page, which will probably be done after the page is
> built, so it's a bit of a chicken and egg problem.
>

Yes, I did not consider that to be a problem because I did not think it
would be used on indexed tables. I figured that the gain from doing bulk
inserts into the table would be so diluted by the still-bottle-necked index
maintenance that it was OK not to use this optimization for indexed tables.

>
> - It only works on new pages. Pages which are not empty, but have free
> space, cannot be written in this way.
>

My original thought was based on the idea of still using heap_insert, but
with a modified form of bistate which would hold the exclusive lock and not
just a pin. If heap_insert is being driven by the unmodified COPY code,
then it can't guarantee that COPY won't stall on a pipe read or something,
and so probably shouldn't hold an exclusive lock while filling the block.
That is why I decided a local buffer would be better, as the exclusive lock
is really a no-op and wouldn't block anyone. But if you are creating a new
heap_bulk_insert and modifying the COPY to go with it, then you can
guarantee it won't stall from the driving end, instead.

Whether any of these approaches will be maintainable enough to be
integrated into the code base is another matter. It seems like there is
already a lot of discussion going on around various permutations of copy
options.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-09-15 03:15:17 Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Previous Message Tom Lane 2009-09-15 02:01:44 Re: CommitFest 2009-09: Now In Progress