Re: Bulk Insert tuning

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Bulk Insert tuning
Date: 2008-03-03 20:59:40
Message-ID: 200803032059.m23Kxe012260@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches


Added to TODO:

o Consider using a ring buffer for COPY FROM

http://archives.postgresql.org/pgsql-patches/2008-02/msg00140.php

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > Following patch implements a simple mechanism to keep a buffer pinned
> > > while we are bulk loading.
> >
> > This will fail to clean up nicely after a subtransaction abort, no?
>
> Yes, will fix.
>
> > (For that matter I don't think it's right even for a top-level abort.)
> > And I'm pretty sure it will trash your table entirely if someone
> > inserts into another relation while a bulk insert is happening.
> > (Not at all impossible, think of triggers for instance.)
>
> The pinned buffer is separate from the preferred block for each
> relation; BulkInsertBuffer isn't used for determining the block to
> insert into. If you try to insert into a block that differs from the
> pinned one it unpins it and re-pins the new one. So it is always safe
> with respect to the data in the table.
>
> It can run into recursive bulk insert ops but that just destroys the
> performance advantage, its not actually dangerous.
>
> > >From a code structural point of view, we are already well past the
> > number of distinct options that heap_insert ought to have. I was
> > thinking the other day that bulk inserts ought to use a ring-buffer
> > strategy to avoid having COPY IN trash the whole buffer arena, just
> > as we've taught COPY OUT not to. So maybe a better idea is to
> > generalize BufferAccessStrategy to be able to handle write as well
> > as read concerns; or have two versions of it, one for writing and one
> > for reading. In any case the point being to encapsulate all these
> > random little options in a struct, which could also carry along
> > state that needs to be saved across a series of inserts, such as
> > the last pinned buffer.
>
> That was actually my first thought when I realised recursive ops were
> possible. I don't think its necessary from a code correctness
> perspective but it might be an appropriate re-factoring considering
> those little bool-s seem to be breeding.
>
> I think we need two Strategy types since CTAS would need one of each.
> But then VACUUM is mid-way on that. Hmmm. Will consider.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2008-03-03 21:13:04 Re: Reference by output in : \d <table_name>
Previous Message Bruce Momjian 2008-03-03 19:42:16 Re: CopyReadLineText optimization