Re: Using multi-row technique with COPY

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using multi-row technique with COPY
Date: 2005-11-28 12:05:08
Message-ID: 1133179508.6165.2.camel@dell9300
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2005-11-28 at 00:56 +0000, Simon Riggs wrote:
> On Sun, 2005-11-27 at 17:45 -0500, Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > COPY FROM can read in sufficient rows until it has a whole block worth
> > > of data, then get a new block and write it all with one pair of
> > > BufferLock calls.
> >
> > > Comments?
> >
> > I don't see any way to do this without horrible modularity violations.
> > The COPY code has no business going anywhere near individual buffers;
> > for that matter, it doesn't even really know what "a block worth" of
> > data is, since the tuples it's dealing with aren't toasted yet.
>
> I've taken on board your comments about modularity issues from earlier.
> [I've not included anything on unique indexes, notice]
>
> I was expecting to buffer this in the heap access method with a new
> call, say, heap_bulk_insert() rather than have all that code hanging
> around in COPY. A lower level routine RelationGetBufferForTupleArray can
> handle the actual grunt. It can work, without ugliness.
>
> We'd need to handle a buffer bigger than a single tuple anyway, so you
> keep adding tuples until the last one tips over the edge, which then
> gets saved for the next block. Heap access method knows about blocks.
>
> We could reasonably do a test for would-be-toasted within those
> routines. I should have said that this wouldn't apply if any of the
> tuples require toasting, which of course has to be a dynamic test.

If we had a buffer big enough (say 10-100x the page size), then we would
not actually need to test toasting. We can just pass the big buffer to
heap_bulk_insert() which inserts the whole buffer in as big chunks as
needed to fill the free space on pages (with single page lock).

--------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-11-28 12:32:02 Re: Using multi-row technique with COPY
Previous Message Alvaro Herrera 2005-11-28 11:25:48 Re: Using multi-row technique with COPY