Re: Bulk Insert tuning

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Bulk Insert tuning
Date: 2008-02-26 21:36:36
Message-ID: 1204061796.4252.453.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Following patch implements a simple mechanism to keep a buffer pinned
> > while we are bulk loading.
>
> This will fail to clean up nicely after a subtransaction abort, no?

Yes, will fix.

> (For that matter I don't think it's right even for a top-level abort.)
> And I'm pretty sure it will trash your table entirely if someone
> inserts into another relation while a bulk insert is happening.
> (Not at all impossible, think of triggers for instance.)

The pinned buffer is separate from the preferred block for each
relation; BulkInsertBuffer isn't used for determining the block to
insert into. If you try to insert into a block that differs from the
pinned one it unpins it and re-pins the new one. So it is always safe
with respect to the data in the table.

It can run into recursive bulk insert ops but that just destroys the
performance advantage, its not actually dangerous.

> >From a code structural point of view, we are already well past the
> number of distinct options that heap_insert ought to have. I was
> thinking the other day that bulk inserts ought to use a ring-buffer
> strategy to avoid having COPY IN trash the whole buffer arena, just
> as we've taught COPY OUT not to. So maybe a better idea is to
> generalize BufferAccessStrategy to be able to handle write as well
> as read concerns; or have two versions of it, one for writing and one
> for reading. In any case the point being to encapsulate all these
> random little options in a struct, which could also carry along
> state that needs to be saved across a series of inserts, such as
> the last pinned buffer.

That was actually my first thought when I realised recursive ops were
possible. I don't think its necessary from a code correctness
perspective but it might be an appropriate re-factoring considering
those little bool-s seem to be breeding.

I think we need two Strategy types since CTAS would need one of each.
But then VACUUM is mid-way on that. Hmmm. Will consider.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2008-02-26 23:18:10 Re: lc_time and localized dates
Previous Message Tom Lane 2008-02-26 20:12:58 Re: Bulk Insert tuning