Re: Using multi-row technique with COPY

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using multi-row technique with COPY
Date: 2005-11-29 21:30:23
Message-ID: 1133299823.2906.437.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2005-11-29 at 15:58 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Tom Lane wrote:
> >> Log, yes, unless it's a temp table. The point is we could avoid taking
> >> buffer content locks. Come to think of it, we could implement that
> >> trivially in the heapam.c routines; it would then apply to any table
> >> update whether generated by COPY or otherwise.
>
> > We should do that, yes, we've discussed that before.
>
> Not that I recall ...

It was a quick comment of yours only, not a long discussion. Its a good
idea and we should do it, but its not everything I want (other posts
pursue the other ideas)

> anyway, I had second thoughts about this while
> eating lunch. A buffer for a new table can reasonably be assumed not to
> be accessed by any other backend, but we can *not* make that assumption
> for the bgwriter. This means that the bgwriter might come along and
> dump the buffer to disk while we are in the midst of scribbling on it.
> Since we'll mark the thing dirty after we finish scribbling, no harm
> is done --- unless there is a crash after we commit and before the next
> write of the buffer occurs. In that case we have a corrupt copy of the
> buffer on disk, which is going to pose a problem for recovery. (As an
> example, the corrupt copy might have an invalid LSN that is larger than
> it really should be, which will cause us to ignore the WAL record that
> might have otherwise let us fix the problem.)
>
> As far as I can see at the moment, such a problem will be repaired
> during recovery *as long as you have the full-page-writes option on*.
> Otherwise not. So we'd probably have to restrict the optimization to
> happen only if that's on, as well as only for new tables.

Not sure I understand that. If there's a crash then the transaction
failed so any data on the partially written block is invisible. Why else
would the block become corrupt?

Whether full page writes is on or not, we still fill the block. After
which we never log the full page because the block doesn't change again.

Should we checkpoint half way through filling the block, then its still
written to disk whether we do full page writes or not.

Maybe its just late here...

> >> Exactly my point; we don't have to change any syntax, so pg_dump
> >> doesn't care.
>
> > Does pg_dump wrap the CREATE TABLE and COPY into the same transaction?
>
> No, but you can manually put BEGIN and END around the whole script.
> Some people do that anyway. (It wouldn't be a bad idea to add an option
> to pg_restore to do it, since there's no very convenient way to add
> begin/end manually when piping pg_restore right into a database.)

Can do.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2005-11-29 21:30:28 Re: ice-broker scan thread
Previous Message Gavin Sherry 2005-11-29 21:22:36 Re: ice-broker scan thread