Re: Using multi-row technique with COPY

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using multi-row technique with COPY
Date: 2005-11-29 23:51:53
Message-ID: 200511292351.jATNprm20699@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Please let me back up and ask a more simplistic question. I understand
the idea of allowing COPY to insert rows with less locking, but I am
wondering about the NOLOGGING idea. On commit, we must guarantee that
all the rows are in the table, so what advantage is there to a NOLOGGING
option?

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Tue, 2005-11-29 at 14:17 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > As a further enhancement, I would also return to the NOLOGGING option
> > > for COPY. Previously we had said that COPY LOCK was the way to go -
> > > taking a full table lock to prevent concurrent inserts to a block from a
> > > COPY that didn't write WAL and another backend which wanted to write WAL
> > > about that block. With the above suggested all-inserts-at-once
> > > optimization, it would no longer be a requirement to lock the table.
> > > That means we can continue to take advantage of the ability to run
> > > multiple COPY loads into the same table. Avoiding writing WAL will
> > > further reduce CPU by about 15% and I/O by about 50%.
> > >
> > > I would also suggest that pgdump be changed to use the NOLOGGING option
> > > by default, with an option to work as previously.
> >
> > For those who have been around, they know I dislike having options that
> > 95% of our users desire not be the default behavior.
>
> I like that approach and it makes for a great product.
>
> > I think the COPY
> > NOLOGGING idea falls in that category. I would like to explore if there
> > is a way to have COPY automatically do no logging where possible by
> > default.
>
> see last, below
>
> > First, I think NOLOGGING is probably the wrong keyword. I am thinking
> > SHARE/EXCLUSIVE is best because they are already keywords, and they
> > explain the effect of the flag on other applications, rather than the
> > LOGGING capability, which is invisible to applications.
> >
> > I am thinking we would have COPY WITH [ [ EXCLUSIVE | SHARE ] [ LOCK ]] ...
> > EXCLUSIVE lock would be NOLOGGING, SHARE would do logging because other
> > applications could insert into the table at the same time (and do
> > UPDATES/DELETES of the inserted rows).
>
> I wasn't trying to suggest new wording for COPY LOCK, or whatever it was
> called previously. My suggestion was to alter the underlying mechanism
> so that it would not actually need to take a lock at all. So I was
> trying to find a name that matched that thought.... so your comments
> above don't really apply.... but I'm more than happy to have others
> suggest names and will go with the flow.
>
> > One idea for default behavior would be to use EXCLUSIVE when the table
> > is zero size. I think that would do pg_dump and most of the user cases,
> > and of course users could override the default by using a keyword. We
> > could emit a NOTICE if an an exclusive lock is used without an EXCLUSIVE
> > keyword. One problem I see is that there is no way to insure zero size
> > without a lock that blocks other writers. Is that reliable?
>
> I think it would be possible to say: if when a COPY starts the table has
> zero blocks, then we do a NOLOGGING (change_the_name) mode operation.
>
> That gives a problem for recovery. If you lose the server and recover
> with PITR, that table would come back empty. I'd want to make very sure
> that I knew whether it was going to be there, or not, and I want it to
> be my choice. So, I agree with the thought from a automatic performance
> perspective, but not from a recovery perspective.
>
> (There is some irony in being the person to both advocate PITR and to
> try so hard not to log...)
>
> I think of PITR as the default mode of operation now; almost all Oracle,
> DB2, SQLServer etc users make consistent use of archivelogmode/archive
> recovery.
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-30 00:08:50 A couple of proposed pgbench changes
Previous Message Luke Lonergan 2005-11-29 23:42:27 Re: ice-broker scan thread