Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 13:19:45
Message-ID: 1135862385.2964.804.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote:
> Having read through this thread, I would like to propose a
> syntax/behavior.
>
> I think we all now agree that the logging is more part of the table than
> the command itself. Right now we have a COPY LOCK patch, but people are
> going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> and all sorts of other things, so I think we are best adding an ALTER
> TABLE capability. I am thinking of this syntax:
>
> ALTER TABLE name RELIABILITY option
>
> where "option" is:
>
> DROP [ TABLE ON CRASH ]
> DELETE [ ROWS ON CRASH ]
> EXCLUSIVE
> SHARE
>
> Let me explain each option. DROP would drop the table on a restart
> after a non-clean shutdown. It would do _no_ logging on the table and
> allow concurrent access, plus index access. DELETE is the same as DROP,
> but it just truncates the table (perhaps TRUNCATE is a better word).
>
> EXCLUSIVE would allow only a single session to modify the table, and
> would do all changes by appending to the table, similar to COPY LOCK.
> EXCLUSIVE would also not allow indexes because those can not be isolated
> like appending to the heap. EXCLUSIVE would write all dirty shared
> buffers for the table and fsync them before committing. SHARE is the
> functionality we have now, with full logging.
>
> Does this get us any closer to a TODO item? It isn't great, but I think
> it is pretty clear, and I assume pg_dump would use ALTER to load each
> table. The advanage is that the COPY statements themselves are
> unchanged so they would work in loading into older versions of
> PostgreSQL.

First off, thanks for summarising a complex thread.

My view would be that this thread has been complex because everybody has
expressed a somewhat different requirement, which could be broken down
as:
1. The need for a multi-user-accessible yet temporary table
2. Loading data into a table immediately after it is created (i.e. in
same transaction), including but not limited to a reload from pg_dump
3. How to load data quickly into an existing table (COPY)
4. How to add/modify data quickly in an existing table (INSERT SELECT,
UPDATE)

I can see the need for all of those individually; my existing patch
submission covers (2) and (3) only. I very much like your thought to
coalesce these various requirements into a single coherent model.

For requirement (1), table level options make sense. We would:
- CREATE TABLE ALLTHINGS
- ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY
- lots of SQL, all fast because not logged

(2) is catered for adequately by the existing COPY patch i.e. it will
detect whether a table has just been created and then avoid writing WAL.
In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
pg_dump *will* work with any other version of PostgreSQL, which *would
not* be the case if we added ALTER TABLE ... RELIABILITY statements into
it. Also, a pg_dump created at an earlier version could also be loaded
faster using the patch. The only requirement is to issue all SQL as part
of the same transaction - which is catered for by the
--single-transaction option on pg_restore and psql. So (2) is catered
for fully without the need for an ALTER TABLE ... RELIABILITY statement
or COPY LOCK.

For requirement (3), I would use table level options like this:
(the table already exists and is reasonably big; we should not assume
that everybody can and does use partitioning)
- ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE
- COPY
- ALTER TABLE RELIABILITY ALLTHINGS2 SHARE

For a load into an existing table I would always do all three actions
together. COPY LOCK does exactly that *and* does it atomically.

The two ways of doing (3) have a few pros/cons either way:
Pro for ALTER TABLE:
- same syntax as req (1)
- doesn't need the keyword LOCK
- allows INSERT SELECT, UPDATE operations also (req 4)
Cons:
- existing programs have to add additional statements to take advantage
of this; with COPY LOCK we would add just a single keyword
- operation is not atomic, which might lead to some operations waiting
for a lock to operate as unlogged, since they would execute before the
second ALTER TABLE gets there
- operation will be understood by some, but not others. They will forget
to switch the RELIABILITY back on and then lose their whole table when
the database crashes. (watch...)

...but would it be a problem to have both?

So, my thinking would be to separate things into two:
a) Add a TODO item "shared temp tables" that caters for (1) and (4)

ALTER TABLE name RELIABILITY
{DELETE ROWS AT RECOVERY | FULL RECOVERY}
(syntax TBD)

which would
- truncate all rows and remove all index entries during recovery
- use shared_buffers, not temp_buffers
- never write xlog records, even when in PITR mode
- would avoid writing WAL for both heap *and* index tuples

b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as
*separate* optimizations (but using a common infrastructure in code).
[This work was based upon discussions on -hackers only 6 months ago, so
its not like its been snuck in or anything
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ]

These two thoughts are separable. There is no need to
have-both-or-neither within PostgreSQL.

Eventually, I'd like all of these options, as a database designer.

Best Regards, Simon Riggs

> --------------------------------------------------------------------------
>
> Martijn van Oosterhout wrote:
> -- Start of PGP signed section.
> > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > > I would not be against such a table-level switch, but the exact
> > > behaviour would need to be specified more closely before this became a
> > > TODO item, IMHO.
> >
> > Well, I think at a per table level is the only sensible level. If a
> > table isn't logged, neither are the indexes. After an unclean shutdown
> > the data could be anywhere between OK and rubbish, with no way of
> > finding out which way.
> >
> > > If someone has a 100 GB table, they would not appreciate the table being
> > > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > > of the 100 GB table.
> >
> > Ah, but wouldn't such a large table be partitioned in such a way that
> > you could have the most recent partition having the loaded data.
> > Personally, I think these "shared temp tables" have more applications
> > than meet the eye. I've had systems with cache tables which could be
> > wiped on boot. Though I think my preference would be to TRUNCATE rather
> > than DROP on unclean shutdown.
> >
> > Have a nice day,
> > --
> > Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > > tool for doing 5% of the work and then sitting around waiting for someone
> > > else to do the other 95% so you can sue them.
> -- End of PGP section, PGP failed!
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2005-12-29 14:31:08 Re: localization problem (and solution)
Previous Message Simon Riggs 2005-12-29 11:58:06 Re: WAL logs multiplexing?