Re: Bulk Insert tuning

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Bulk Insert tuning
Date: 2008-03-26 13:28:41
Message-ID: 1206538121.4285.1181.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Following patch implements a simple mechanism to keep a buffer pinned
> > while we are bulk loading.
>
> This will fail to clean up nicely after a subtransaction abort, no?
> (For that matter I don't think it's right even for a top-level abort.)
> And I'm pretty sure it will trash your table entirely if someone
> inserts into another relation while a bulk insert is happening.
> (Not at all impossible, think of triggers for instance.)
>
> >From a code structural point of view, we are already well past the
> number of distinct options that heap_insert ought to have. I was
> thinking the other day that bulk inserts ought to use a ring-buffer
> strategy to avoid having COPY IN trash the whole buffer arena, just
> as we've taught COPY OUT not to.

Agree with that. That was mentioned here again
http://archives.postgresql.org/pgsql-hackers/2008-02/msg01080.php

What do you think of the "Full Block List" idea?

> So maybe a better idea is to
> generalize BufferAccessStrategy to be able to handle write as well
> as read concerns; or have two versions of it, one for writing and one
> for reading. In any case the point being to encapsulate all these
> random little options in a struct, which could also carry along
> state that needs to be saved across a series of inserts, such as
> the last pinned buffer.

I'm trying to implement this, but it begins to look quite ugly.

First, if we allow multiple BulkInsertBuffers then we have to remember
them all in a list so we can unpin them all in case of abort. The change
isn't needed for correctness, as explained before.

Second, we need multiple BufferIOStrategy objects for various purposes.
There is no single default strategy, since normal inserts, normal
updates and toast all have different default behaviour. That makes it
ugly because we either need to differentiate between update/insert and
toast/main inserts - which leads to just as many options, or we need to
have lots of statically defined BufferIOStrategy objects for various
purposes.

I did agree that it was sensible to try to refactor the code, but now it
just looks like a big pile of ugly changes for no benefit. I'll save my
WIP so we can judge.

Coding it using flags that can be ORd together makes more sense than a
list of bools and will be easier to read. I'm going to try that approach
instead.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2008-03-26 13:31:12 Re: \password in psql help
Previous Message Heikki Linnakangas 2008-03-26 13:26:12 Re: Consistent \d commands in psql