Re: WAL logging problem in 9.4.3?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2015-07-06 15:49:54
Message-ID: 28415.1436197794@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2015-07-06 11:14:40 -0400, Tom Lane wrote:
>> The COUNT() correctly says 11 rows, but after crash-and-recover,
>> only the row with -1 is there. This is because the INSERT writes
>> out an INSERT+INIT WAL record, which we happily replay, clobbering
>> the data added later by COPY.

> ISTM any WAL logged action that touches a relfilenode essentially needs
> to disable further optimization based on the knowledge that the relation
> is new.

After a bit more thought, I think it's not so much "any WAL logged action"
as "any unconditionally-replayed action". INSERT+INIT breaks this
example because heap_xlog_insert will unconditionally replay the action,
even if the page is valid and has same or newer LSN. Similarly, TRUNCATE
is problematic because we redo it unconditionally (and in that case it's
hard to see an alternative).

> It'd not be impossible to add more state to the relcache entry for the
> relation. Whether it's likely that we'd find all the places that'd need
> updating that state, I'm not sure.

Yeah, the sticking point is mainly being sure that the state is correctly
tracked, both now and after future changes. We'd need to identify a state
invariant that we could be pretty confident we'd not break.

One idea I had was to allow the COPY optimization only if the heap file is
physically zero-length at the time the COPY starts. That would still be
able to optimize in all the cases we care about making COPY fast for.
Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this. That should be safe if the table was
previously empty.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2015-07-06 16:13:03 Re: dblink: add polymorphic functions.
Previous Message Sawada Masahiko 2015-07-06 15:46:44 Re: Freeze avoidance of very large table.