Re: Using multi-row technique with COPY

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using multi-row technique with COPY
Date: 2005-11-29 20:35:03
Message-ID: 1133296503.2906.394.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2005-11-29 at 14:17 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > As a further enhancement, I would also return to the NOLOGGING option
> > for COPY. Previously we had said that COPY LOCK was the way to go -
> > taking a full table lock to prevent concurrent inserts to a block from a
> > COPY that didn't write WAL and another backend which wanted to write WAL
> > about that block. With the above suggested all-inserts-at-once
> > optimization, it would no longer be a requirement to lock the table.
> > That means we can continue to take advantage of the ability to run
> > multiple COPY loads into the same table. Avoiding writing WAL will
> > further reduce CPU by about 15% and I/O by about 50%.
> >
> > I would also suggest that pgdump be changed to use the NOLOGGING option
> > by default, with an option to work as previously.
>
> For those who have been around, they know I dislike having options that
> 95% of our users desire not be the default behavior.

I like that approach and it makes for a great product.

> I think the COPY
> NOLOGGING idea falls in that category. I would like to explore if there
> is a way to have COPY automatically do no logging where possible by
> default.

see last, below

> First, I think NOLOGGING is probably the wrong keyword. I am thinking
> SHARE/EXCLUSIVE is best because they are already keywords, and they
> explain the effect of the flag on other applications, rather than the
> LOGGING capability, which is invisible to applications.
>
> I am thinking we would have COPY WITH [ [ EXCLUSIVE | SHARE ] [ LOCK ]] ...
> EXCLUSIVE lock would be NOLOGGING, SHARE would do logging because other
> applications could insert into the table at the same time (and do
> UPDATES/DELETES of the inserted rows).

I wasn't trying to suggest new wording for COPY LOCK, or whatever it was
called previously. My suggestion was to alter the underlying mechanism
so that it would not actually need to take a lock at all. So I was
trying to find a name that matched that thought.... so your comments
above don't really apply.... but I'm more than happy to have others
suggest names and will go with the flow.

> One idea for default behavior would be to use EXCLUSIVE when the table
> is zero size. I think that would do pg_dump and most of the user cases,
> and of course users could override the default by using a keyword. We
> could emit a NOTICE if an an exclusive lock is used without an EXCLUSIVE
> keyword. One problem I see is that there is no way to insure zero size
> without a lock that blocks other writers. Is that reliable?

I think it would be possible to say: if when a COPY starts the table has
zero blocks, then we do a NOLOGGING (change_the_name) mode operation.

That gives a problem for recovery. If you lose the server and recover
with PITR, that table would come back empty. I'd want to make very sure
that I knew whether it was going to be there, or not, and I want it to
be my choice. So, I agree with the thought from a automatic performance
perspective, but not from a recovery perspective.

(There is some irony in being the person to both advocate PITR and to
try so hard not to log...)

I think of PITR as the default mode of operation now; almost all Oracle,
DB2, SQLServer etc users make consistent use of archivelogmode/archive
recovery.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2005-11-29 20:35:25 Re: Using multi-row technique with COPY
Previous Message Bruce Momjian 2005-11-29 20:15:21 Re: Using multi-row technique with COPY