Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 13:09:12
Message-ID: 1135948152.2862.113.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > My view would be that this thread has been complex because everybody has
> > > expressed a somewhat different requirement, which could be broken down
> > > as:
> > > 1. The need for a multi-user-accessible yet temporary table
> > > 2. Loading data into a table immediately after it is created (i.e. in
> > > same transaction), including but not limited to a reload from pg_dump
> > > 3. How to load data quickly into an existing table (COPY)
> > > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > > UPDATE)

> > However, you then seem to be arguing for still using the COPY LOCK
> > syntax, which I think Bruce intended would go away in favor of using
> > these ALTER commands. Certainly that's what I'd prefer --- COPY has
> > got too darn many options already.

COPY LOCK was Tom's suggestion at the end of a long discussion thread on
this precise issue. Nobody objected to it at that point; I implemented
it *exactly* that way because I wanted to very visibly follow the
consensus of the community, after informed debate.
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php

Please re-read the links to previous discussions.
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
There are points there, not made by me, that still apply and need to be
considered here, yet have not been.

Just to restate my current thinking:
- agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS
- we should have COPY LOCK rather than
ALTER TABLE .... RELIABILITY EXCLUSIVE
(Though I welcome better wording and syntax in either case; it is the
behaviour only that I discuss).

It seems now that we have agreed approaches for (1), (2) and (4). Please
note that I have listened to the needs of others with regard to
requirement (1), as espoused by earlier by Hannu and again now by
Martijn. Some of the points about requirement (3) I made in my previous
post have not yet been addressed, IMHO.

My mind is not fixed. AFAICS there are valid points remaining on both
sides of the discussion about loading data quickly into an existing
table.

> I do think it is valid concern about someone use the table between the
> CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow
> the RELIABILITY as part of the CREATE TABLE, another is to tell users to
> create the table inside a transaction.

Neither solution works for this use case:

> > 3. How to load data quickly into an existing table (COPY)

This is the only use case for which ALTER TABLE ... EXCLUSIVE makes
sense. That option means that any write lock held upon the table would
be an EXCLUSIVE table lock, so would never be a performance gain with
single row INSERT, UPDATE or DELETEs.

Following Andrew's concerns, I'd also note that ALTER TABLE requires a
much higher level of privilege to operate than does COPY. That sounds
like it will make things more secure, but all it does is open up the
administrative rights, since full ownership rights must be obtained
merely to load data.

> Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible

Good

> I think there is great utility in giving users one API, namely
> RELIABILITY (or some other keyword), and telling them that is where they
> control logging. I realize adding one keyword, LOCK, to an existing
> command isn't a big deal, but once you decentralize your API enough
> times, you end up with a terribly complex database system. It is this
> design rigidity that helps make PostgreSQL so much easier to use than
> other database systems.

I do see the appeal of your suggestion...

TRUNCATE is a special command to delete quickly. There is no requirement
to do an ALTER TABLE statement before that command executes.

Balance would suggest that a special command to load data quickly would
be reasonably accepted by users.

Minor points below:

> > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > > pg_dump *will* work with any other version of PostgreSQL, which *would
> > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > > it.
> >
> > Wrong --- the good thing about ALTER TABLE is that an old version of
> > Postgres would simply reject it and keep going. Therefore we could get
> > the speedup in dumps without losing compatibility, which is not true
> > of COPY LOCK.

That was pointing out one of Bruce's objections was not relevant because
it assumed COPY LOCK was required to make pg_restore go faster; that was
not the case - so there is no valid objection either way now.

> > BTW, this is a perfect example of the use-case for not abandoning a
> > dump-file load simply because one command fails. (We have relied on
> > this sort of reasoning many times before, too, for example by using
> > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> > I don't think that "wrap the whole load into begin/end" is really a very
> > workable answer, because there are far too many scenarios where you
> > can't do that. Another one where it doesn't help is a data-only dump.

Which is why --single-transaction is not the default, per the earlier
discussion on that point (on -patches).

> Yep, Tom is echoing my reaction. There is a temptation to add things up
> onto existing commands, e.g. LOCK, and while it works, it makes for some
> very complex user API's. Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible, but once you
> require users to do that to get the speedup, it isn't user-invisible
> anymore.
>
> (I can see it now, "Why is pg_dump putting things in transactions?",
> "Because it prevents it from being logged." "Oh, should I be doing that
> in my code?" "Perhaps, if you want ..." You can see where that
> discussion is going. Having them see "ATER TABLE ... RELIBILITY
> TRUNCATE" is very clear, and very clear on how it can be used in user
> code.)

The above case is not an argument against COPY LOCK. Exactly what you
say above would still occur even when we have ALTER TABLE ...
RELIABILITY statement, since COPY LOCK and
COPY-optimized-within-same-transaction are different things.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2005-12-30 13:35:37 Re: [HACKERS] Online backup vs Continuous backup
Previous Message Sebastian 2005-12-30 08:15:03 Re: broken 'SHOW TABLE'-like query works in 8, not 8.1.1