Re: Using multi-row technique with COPY

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using multi-row technique with COPY
Date: 2005-11-30 06:35:23
Message-ID: 1133332523.2906.499.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2005-11-29 at 18:51 -0500, Bruce Momjian wrote:
> Please let me back up and ask a more simplistic question. I understand
> the idea of allowing COPY to insert rows with less locking, but I am
> wondering about the NOLOGGING idea. On commit, we must guarantee that
> all the rows are in the table, so what advantage is there to a NOLOGGING
> option?

We would need to flush all the blocks in the table out of cache at
commit time, for that table only. (As with CTAS, CIDX).

To allow a full discussion, I'll separate the various ideas:
1. COPY using bulk copy mode
2. NOLOGGING
3. Created in this transaction
4. ERRORTABLES
5. Uniqueness violations

Right now, I think you have reasonable objections/input to (2) that we
should discuss more before agreeing a way forward. I would aim to do (1)
first, then return with a full and much better explained proposal for
(2) for us to discuss, since (2) depends upon (1) somewhat.

(3) and (4) seem to have been generally accepted, but (5) seems not
viable with present thinking.

Best Regards, Simon Riggs

> ---------------------------------------------------------------------------
>
> Simon Riggs wrote:
> > On Tue, 2005-11-29 at 14:17 -0500, Bruce Momjian wrote:
> > > Simon Riggs wrote:
> > > > As a further enhancement, I would also return to the NOLOGGING option
> > > > for COPY. Previously we had said that COPY LOCK was the way to go -
> > > > taking a full table lock to prevent concurrent inserts to a block from a
> > > > COPY that didn't write WAL and another backend which wanted to write WAL
> > > > about that block. With the above suggested all-inserts-at-once
> > > > optimization, it would no longer be a requirement to lock the table.
> > > > That means we can continue to take advantage of the ability to run
> > > > multiple COPY loads into the same table. Avoiding writing WAL will
> > > > further reduce CPU by about 15% and I/O by about 50%.
> > > >
> > > > I would also suggest that pgdump be changed to use the NOLOGGING option
> > > > by default, with an option to work as previously.
> > >
> > > For those who have been around, they know I dislike having options that
> > > 95% of our users desire not be the default behavior.
> >
> > I like that approach and it makes for a great product.
> >
> > > I think the COPY
> > > NOLOGGING idea falls in that category. I would like to explore if there
> > > is a way to have COPY automatically do no logging where possible by
> > > default.
> >
> > see last, below
> >
> > > First, I think NOLOGGING is probably the wrong keyword. I am thinking
> > > SHARE/EXCLUSIVE is best because they are already keywords, and they
> > > explain the effect of the flag on other applications, rather than the
> > > LOGGING capability, which is invisible to applications.
> > >
> > > I am thinking we would have COPY WITH [ [ EXCLUSIVE | SHARE ] [ LOCK ]] ...
> > > EXCLUSIVE lock would be NOLOGGING, SHARE would do logging because other
> > > applications could insert into the table at the same time (and do
> > > UPDATES/DELETES of the inserted rows).
> >
> > I wasn't trying to suggest new wording for COPY LOCK, or whatever it was
> > called previously. My suggestion was to alter the underlying mechanism
> > so that it would not actually need to take a lock at all. So I was
> > trying to find a name that matched that thought.... so your comments
> > above don't really apply.... but I'm more than happy to have others
> > suggest names and will go with the flow.
> >
> > > One idea for default behavior would be to use EXCLUSIVE when the table
> > > is zero size. I think that would do pg_dump and most of the user cases,
> > > and of course users could override the default by using a keyword. We
> > > could emit a NOTICE if an an exclusive lock is used without an EXCLUSIVE
> > > keyword. One problem I see is that there is no way to insure zero size
> > > without a lock that blocks other writers. Is that reliable?
> >
> > I think it would be possible to say: if when a COPY starts the table has
> > zero blocks, then we do a NOLOGGING (change_the_name) mode operation.
> >
> > That gives a problem for recovery. If you lose the server and recover
> > with PITR, that table would come back empty. I'd want to make very sure
> > that I knew whether it was going to be there, or not, and I want it to
> > be my choice. So, I agree with the thought from a automatic performance
> > perspective, but not from a recovery perspective.
> >
> > (There is some irony in being the person to both advocate PITR and to
> > try so hard not to log...)
> >
> > I think of PITR as the default mode of operation now; almost all Oracle,
> > DB2, SQLServer etc users make consistent use of archivelogmode/archive
> > recovery.
> >
> > Best Regards, Simon Riggs
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
> >
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tino Wildenhain 2005-11-30 06:42:36 Re: BIN()
Previous Message Greg Stark 2005-11-30 06:23:11 Re: slow IN() clause for many cases