Re: Performance and WAL on big inserts/updates

From: Marty Scholes <marty(at)outputservices(dot)com>
To: sailesh(at)cs(dot)berkeley(dot)edu
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance and WAL on big inserts/updates
Date: 2004-03-12 02:30:46
Message-ID: 405120D6.3080501@outputservices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> A major reason for this is that logical logs make recovery contingent
> on being able to execute the "parsed statements". This execution
> might, however, not be possible if the system is itself not in a
> consistent state .. as is normally the case during recovery.
>

I am not sure I follow you here. The logs should (IMHO) save both types
of data: physical pages (what happens now), or the SQL statement if it
is small and generates a bunch of changes.

If the DB state cannot be put back to a consistent state prior to a SQL
statement in the log, then NO amount of logging will help. The idea is
that the state can be put back to what it was prior to a particular log
entry, be it raw datafile blocks or a SQL statement.

> What if, for instance, it's the catalog tables that were hosed when
> the system went down ? It may be difficult to execute the parsed
> statements without the catalogs.
>

See above. If this cannot be resolved prior to re-executing a statement
in the log, then the problem is beyond ANY subsequent logging.

> Having said that, page-oriented undo logging can be a pain when B-tree
> pages split. For higher concurrency, ARIES uses logical undo
> logging. In this case, the logs are akin to your "parsed statement"
> idea.
>

Yes, my experience exactly. Maybe we are the only company on the planet
that experiences this sort of thing. Maybe not.

> In any case, the only place that parsed statements are useful, imo are
> with searched updates that cause a large number of records to change
> and with "insert into from select" statements.
>

Yes. Correlated UPDATE, INSERT INTO with subselects AND mass DELETE on
heavily indexed tables. Index creation... The list goes on and on. I
have experienced and live it all on a daily basis with Oracle. And I
despise it.

The difference is, of course, I can't even have this kind of discussion
with Oracle, but I can here. ;-)

> Then, there is also the case that this, the "parsed statements"
> approach, is not a general solution. How would you handle the "update
> current of cursor" scenarios ? In this case, there is some application
> logic that determines the precise records that change and how they
> change.
>
> Ergo, it is my claim that while logical redo logging does have some
> benefits, it is not a viable general solution.
>

Agreed, this is not a general solution. What it is, however, is a
tremendous improvement over the current situation for transactions that
do massive changes to heavily indexed datasets.

I am working on an application right now that will require current
postal information on EVERY address in the U.S. -- street name, street
address, directional, subunit, 5 digit zip, 3 digit zip, city, state,
delivery point barcode, carrier route, lattitude, longitude, etc. Most
of these fields will need to be indexed, because they will be searched
in real time via a web application several thousand times per day.

To keep the address current, we will be updating them all (150+ million)
on a programmed basis, so we will go through and update several
million addresses EVERY DAY, while needing to ensure that the address
updates happen atomically so that they don't disrupt web activity.

Maybe this is not a "traditional" RDBMS app, but I am not in the mood to
write my own storage infrastructure for it.

Then again, maybe I don't know what I am talking about...

Marty

Sailesh Krishnamurthy wrote:
>>>>>>"Marty" == Marty Scholes <marty(at)outputservices(dot)com> writes:
>>>>>
>
> Marty> Why have I not seen this in any database?
> Marty> There must be a reason.
>
> For ARIES-style systems, logging parsed statements (commonly called
> "logical" logging) is not preferred compared to logging data items
> ("physical" or "physiological" logging).
>
> A major reason for this is that logical logs make recovery contingent
> on being able to execute the "parsed statements". This execution
> might, however, not be possible if the system is itself not in a
> consistent state .. as is normally the case during recovery.
>
> What if, for instance, it's the catalog tables that were hosed when
> the system went down ? It may be difficult to execute the parsed
> statements without the catalogs.
>
> For this reason, a major goal of ARIES was to have each and every data
> object (tables/indexes) individually recoverable. So ARIES follows
> page-oriented redo logging.
>
> Having said that, page-oriented undo logging can be a pain when B-tree
> pages split. For higher concurrency, ARIES uses logical undo
> logging. In this case, the logs are akin to your "parsed statement"
> idea.
>
> In any case, the only place that parsed statements are useful, imo are
> with searched updates that cause a large number of records to change
> and with "insert into from select" statements.
>
> Then, there is also the case that this, the "parsed statements"
> approach, is not a general solution. How would you handle the "update
> current of cursor" scenarios ? In this case, there is some application
> logic that determines the precise records that change and how they
> change.
>
> Ergo, it is my claim that while logical redo logging does have some
> benefits, it is not a viable general solution.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marty Scholes 2004-03-12 02:45:26 Re: Performance and WAL on big inserts/updates
Previous Message Bruce Momjian 2004-03-12 02:29:51 Re: Default Stats Revisited