Re: WAL logging problem in 9.4.3?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2015-07-10 09:14:20
Message-ID: 20150710091420.GK340@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-07-10 11:50:33 +0300, Heikki Linnakangas wrote:
> On 07/10/2015 02:06 AM, Tom Lane wrote:
> >Andres Freund <andres(at)anarazel(dot)de> writes:
> >>On 2015-07-06 11:49:54 -0400, Tom Lane wrote:
> >>>Rather than reverting cab9a0656c36739f, which would re-introduce a
> >>>different performance problem, perhaps we could have COPY create a new
> >>>relfilenode when it does this. That should be safe if the table was
> >>>previously empty.
> >
> >>I'm not convinced that cab9a0656c36739f needs to survive in that
> >>form. To me only allowing one COPY to benefit from the wal_level =
> >>minimal optimization has a significantly higher cost than
> >>cab9a0656c36739f.
> >
> >What evidence have you got to base that value judgement on?
> >
> >cab9a0656c36739f was based on an actual user complaint, so we have good
> >evidence that there are people out there who care about the cost of
> >truncating a table many times in one transaction.
>
> Yeah, if we specifically made that case cheap, in response to a complaint,
> it would be a regression to make it expensive again. We might get away with
> it in a major version, but would hate to backpatch that.

Sure. But making COPY slower would also be one. Of a longer standing
behaviour, with massively bigger impact if somebody relies on it? I mean
a new relfilenode includes a couple heap and storage options. Missing
the skip wal optimization can easily double or triple COPY durations.

I generally find it to be very dubious to re-use a relfilenode after a
truncation. I bet most hackers didn't ever know we ever did that, and
the rest probably forgot it.

We can still retain a portion of the optimizations from cab9a0656c36739f
- there's no need to keep the old relfilenode's contents around after
all.

> >>My tentative guess is that the best course is to
> >>a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the
> >> truncation replay issue.
> >>b) Force new pages to be used when using the heap_sync mode in
> >> COPY. That avoids the INIT danger you found. It seems rather
> >> reasonable to avoid using pages that have already been the target of
> >> WAL logging here in general.
> >
> >And what reason is there to think that this would fix all the problems?
> >We know of those two, but we've not exactly looked hard for other cases.
>
> Hmm. Perhaps that could be made to work, but it feels pretty fragile.

It does. I'm not very happy about this mess.

> For
> example, you could have an insert trigger on the table that inserts
> additional rows to the same table, and those inserts would be intermixed
> with the rows inserted by COPY.

That should be fine? As long as copy only uses new pages INSERT can use
the same ones without problem. I think...

> Full-page images in general are a problem.

With the above rules I don't think it'd be. They'd contain the previous
contents, and we'll not target them again with COPY.

> I think we should
> 1. reliably and explicitly keep track of whether we've WAL-logged any
> TRUNCATE, INSERT/UPDATE+INIT, or any other full-page-logging operations on
> the relation, and
> 2. make sure we never skip WAL-logging again if we have.
>
> Let's add a flag, rd_skip_wal_safe, to RelationData that's initially set
> when a new relfilenode is created, i.e. whenever rd_createSubid or
> rd_newRelfilenodeSubid is set. Whenever a TRUNCATE or a full-page image
> (including INSERT/UPDATE+INIT) is WAL-logged, clear the flag. In copy.c,
> only skip WAL-logging if the flag is still set. To deal with the case that
> the flag gets cleared in the middle of COPY, also check the flag whenever
> we're about to skip WAL-logging in heap_insert, and if it's been cleared,
> ignore the HEAP_INSERT_SKIP_WAL option and WAL-log anyway.

Am I missing something or will this break the BEGIN; TRUNCATE; COPY;
pattern we use ourselves and have suggested a number of times ?

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-07-10 09:40:51 Minor code improvements to create_foreignscan_plan/ExecInitForeignScan
Previous Message Heikki Linnakangas 2015-07-10 09:00:48 Re: configure can't detect proper pthread flags