Re: Fast index build vs. PITR

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Subject: Re: Fast index build vs. PITR
Date: 2004-06-01 08:10:16
Message-ID: 1086077416.3258.230.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2004-06-01 at 01:24, Tom Lane wrote:
> I was just about to commit a patch that revises the btree index build
> procedure as discussed here:
> http://archives.postgresql.org/pgsql-general/2004-05/msg00480.php
> specifically, not using shared buffers during index build and bypassing
> WAL-logging in favor of just fsyncing the index file before commit.
>
> I was actually writing the commit message when it occurred to me that
> this would seriously break PITR. If the WAL datastream doesn't contain
> enough info to rebuild the index then rolling forward from a past backup
> isn't gonna work.
>
> I thought for a little bit about a magic "reconstruct the index" WAL
> entry that would invoke the index build procedure in toto, but that
> doesn't look like it will fly either. (Two problems: during crash
> recovery, you couldn't be sure that what's on disk for the underlying
> table exactly matches the index you need to build --- it could be a
> later state of the table; and besides, the environment of the WAL replay
> process isn't capable of running user-defined functions, so it couldn't
> work for functional indexes.)
>
> So AFAICS, we've got to dump the index contents into WAL to support
> PITR. This is a tad annoying.
>
> What I'm thinking about right now is tweaking the index-build code to
> write to WAL only if it sees that PITR is actually in use. It would
> have to look at the GUC variables to determine whether WAL archiving
> is enabled. If archiving isn't turned on, then we could assume that
> rollforward from a past backup isn't needed in this installation, and
> use the WAL-less index build method.
>
> Comments?

The mechanism you suggest would also break crash recovery, not just PITR
- though the avoidance of shared buffers seems like a gain either way.

..You raise the whole subject of UNRECOVERABLE data objects. Also known
as NOT LOGGED etc.

There are many significant performance gains to be had by turning off
recoverability for certain large operations. Oracle and Teradata make
extensive use of such features, i.e. especially in Data Warehousing.

Examples of such operations might be:
- index builds
- INSERT SELECTs into previously empty tables

This is an important area for performance...not just index builds.

A suggestion would be:
- add the "dont send to xlog" functionality as a user option on each
statement, default=LOGGING - this could be Oracle compatible, or not,
but concept is similar. Put the hooks in now and we can add this to all
appropriate statement syntax later.

e.g.
CREATE INDEX blah ... NO LOGGING... ;
INSERT INTO blah ... NO LOGGING .... SELECT... ;

- if conf file says dont use fsync, then dont write to log - clearly
they dont mind losing data in the event of a crash...
i.e. default=NOLOGGING on all statements

Best regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Harald Fuchs 2004-06-01 08:10:38 Re: pg_dump --comment?
Previous Message Teodor Sigaev 2004-06-01 07:29:49 Re: New btree_gist code has a few problems