Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 16:49:59
Message-ID: 200512301649.jBUGnxn21488@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > > My view would be that this thread has been complex because everybody has
> > > > expressed a somewhat different requirement, which could be broken down
> > > > as:
> > > > 1. The need for a multi-user-accessible yet temporary table
> > > > 2. Loading data into a table immediately after it is created (i.e. in
> > > > same transaction), including but not limited to a reload from pg_dump
> > > > 3. How to load data quickly into an existing table (COPY)
> > > > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > > > UPDATE)
>
> > > However, you then seem to be arguing for still using the COPY LOCK
> > > syntax, which I think Bruce intended would go away in favor of using
> > > these ALTER commands. Certainly that's what I'd prefer --- COPY has
> > > got too darn many options already.
>
> COPY LOCK was Tom's suggestion at the end of a long discussion thread on
> this precise issue. Nobody objected to it at that point; I implemented
> it *exactly* that way because I wanted to very visibly follow the
> consensus of the community, after informed debate.
> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php
>
> Please re-read the links to previous discussions.
> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
> There are points there, not made by me, that still apply and need to be
> considered here, yet have not been.

Yes, I know we agreed to the COPY LOCK, but new features now being
requested, so we have to re-evaluate where we are going with COPY LOCK
to get a more consistent solution.

> Just to restate my current thinking:
> - agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS
> - we should have COPY LOCK rather than
> ALTER TABLE .... RELIABILITY EXCLUSIVE
> (Though I welcome better wording and syntax in either case; it is the
> behaviour only that I discuss).
>
> It seems now that we have agreed approaches for (1), (2) and (4). Please
> note that I have listened to the needs of others with regard to
> requirement (1), as espoused by earlier by Hannu and again now by
> Martijn. Some of the points about requirement (3) I made in my previous
> post have not yet been addressed, IMHO.
>
> My mind is not fixed. AFAICS there are valid points remaining on both
> sides of the discussion about loading data quickly into an existing
> table.
>
> > I do think it is valid concern about someone use the table between the
> > CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow
> > the RELIABILITY as part of the CREATE TABLE, another is to tell users to
> > create the table inside a transaction.
>
> Neither solution works for this use case:
>
> > > 3. How to load data quickly into an existing table (COPY)
>
> This is the only use case for which ALTER TABLE ... EXCLUSIVE makes
> sense. That option means that any write lock held upon the table would
> be an EXCLUSIVE table lock, so would never be a performance gain with
> single row INSERT, UPDATE or DELETEs.

Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use
EXCLUSIVE? We don't want to add "LOCK" to every command that might use
EXCLUSIVE. ALTER is much better for this.

I agree if we thought EXCLUSIVE would only be used for COPY, we could
use LOCK, but I am thinking it will be used for other commands as well.

> Following Andrew's concerns, I'd also note that ALTER TABLE requires a
> much higher level of privilege to operate than does COPY. That sounds
> like it will make things more secure, but all it does is open up the
> administrative rights, since full ownership rights must be obtained
> merely to load data.

True, but as pointed out by others, I don't see that happening too
often.

> > Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible
>
> Good
>
> > I think there is great utility in giving users one API, namely
> > RELIABILITY (or some other keyword), and telling them that is where they
> > control logging. I realize adding one keyword, LOCK, to an existing
> > command isn't a big deal, but once you decentralize your API enough
> > times, you end up with a terribly complex database system. It is this
> > design rigidity that helps make PostgreSQL so much easier to use than
> > other database systems.
>
> I do see the appeal of your suggestion...
>
> TRUNCATE is a special command to delete quickly. There is no requirement
> to do an ALTER TABLE statement before that command executes.

The TRUNCATE happens during recovery. There is no user interaction. It
happens because we can't restore the contents of the table in a
consistent state because no logging was used. Basically, a table marked
RELIABILITY TRUNCATE would be truncated on a recovery start of the
postmaster.

> Balance would suggest that a special command to load data quickly would
> be reasonably accepted by users.
>
>
>
>
> Minor points below:
>
> > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > > > pg_dump *will* work with any other version of PostgreSQL, which *would
> > > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > > > it.
> > >
> > > Wrong --- the good thing about ALTER TABLE is that an old version of
> > > Postgres would simply reject it and keep going. Therefore we could get
> > > the speedup in dumps without losing compatibility, which is not true
> > > of COPY LOCK.
>
> That was pointing out one of Bruce's objections was not relevant because
> it assumed COPY LOCK was required to make pg_restore go faster; that was
> not the case - so there is no valid objection either way now.

I don't consider the single-transaction to be a no-cost solution. You
are adding flags to commands, and you are using a dump layout for
performance where the purpose for the layout is not clear. The ALTER is
clear to the user, and it allows nologging operations to happen after
the table is created.

In fact, for use in pg_dump, I think DROP is the proper operation for
loading, not your transaction wrapping solution. We already agree we
need DROP (or TRUNCATE), so why not use that rather than the transaction
wrap idea?

> > > BTW, this is a perfect example of the use-case for not abandoning a
> > > dump-file load simply because one command fails. (We have relied on
> > > this sort of reasoning many times before, too, for example by using
> > > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> > > I don't think that "wrap the whole load into begin/end" is really a very
> > > workable answer, because there are far too many scenarios where you
> > > can't do that. Another one where it doesn't help is a data-only dump.
>
> Which is why --single-transaction is not the default, per the earlier
> discussion on that point (on -patches).

Right, but why not use DROP/TRUNCATE? That works for old dumps too, and
has no downsides, meaning it can be always on.

> > Yep, Tom is echoing my reaction. There is a temptation to add things up
> > onto existing commands, e.g. LOCK, and while it works, it makes for some
> > very complex user API's. Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible, but once you
> > require users to do that to get the speedup, it isn't user-invisible
> > anymore.
> >
> > (I can see it now, "Why is pg_dump putting things in transactions?",
> > "Because it prevents it from being logged." "Oh, should I be doing that
> > in my code?" "Perhaps, if you want ..." You can see where that
> > discussion is going. Having them see "ATER TABLE ... RELIBILITY
> > TRUNCATE" is very clear, and very clear on how it can be used in user
> > code.)
>
> The above case is not an argument against COPY LOCK. Exactly what you
> say above would still occur even when we have ALTER TABLE ...
> RELIABILITY statement, since COPY LOCK and
> COPY-optimized-within-same-transaction are different things.

See my posting above that we might want EXCLUSIVE for other commands,
meaning ALTER makes more sense.

So, to summarize, I think we should add DROP/TRUNCATE, and use that by
default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2005-12-30 17:39:21 Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Previous Message Tom Lane 2005-12-30 15:18:48 Re: Removing SORTFUNC_LT/REVLT