Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 03:41:42
Message-ID: 87fyoj6u15.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> > I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
> > below. I think in this way, we can always gaurantee its correctness and
> > can always improve it.
>
> I think the entire idea is a waste of time anyway. If we have the COPY
> case covered then we have done the only case that has any practical use
> AFAICS --- you don't use UPDATE or DELETE for bulk loads, and you don't
> use retail INSERTs either if you are striving for speed.

Well the "industry standard" approach for loading data is called "ETL". I
forget what the E stands for, but the idea is to first load the data into a
table exactly as it appears in the input data.

Then the "T" stands for "transform". This could include just normalizing the
input data into the correct format but it could also include looking up
foreign key values for input strings, checking for duplicate records, etc.
This would be a series of UPDATEs and DELETEs.

I guess the "L" stands for load, where you insert it into the real tables.

Up until the point where you load it into the real tables you don't really
care about the transactional integrity of the data. If it's lost you can
always just restart the process.

In many cases you could use temporary tables, but sometimes you might want
multiple processes or multiple transactions to be able to see the data.
Consider for example a loading process that includes a human operator
reviewing the data before the data is approved to be loaded into the final
tables.

But I don't see turning on and off the WAL on a per-transaction basis to be
useful. Every transaction in the system is affected by the WAL status of every
other transaction working with the same tables. It doesn't serve any purpose
to have one transaction bypassing the WAL while everyone else does WAL logging
for the same table; they're all going to lose if the system crashes.

It seems to me the only rational way to approach this is to have a per-table
flag that sets that table to be non-logged. Essentially changing a table's
behaviour to that of a temporary table except that other transactions can see
it. If the system crashes the table is truncated on system restore.

The only problem I have with this is that it smells too much like MySQL MyISAM
tables...

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2005-12-24 03:50:57 Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Previous Message Qingqing Zhou 2005-12-24 03:29:50 Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and