Re: WAL logging problem in 9.4.3?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2016-02-19 13:27:00
Message-ID: CAB7nPqSGFKUAFqPe5t30jeEA+V9yFMM4yJGa3SnkgY1RHzn7Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 19, 2016 at 4:33 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 18, 2016 at 4:27 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Feb 4, 2016 at 3:24 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> I dropped the ball on this one back in July, so here's an attempt to revive
>>> this thread.
>>>
>>> I spent some time fixing the remaining issues with the prototype patch I
>>> posted earlier, and rebased that on top of current git master. See attached.
>>>
>>> Some review of that would be nice. If there are no major issues with it, I'm
>>> going to create backpatchable versions of this for 9.4 and below.
>>
>> I am going to look into that very soon. For now and to not forget
>> about this bug, I have added an entry in the CF app:
>> https://commitfest.postgresql.org/9/528/
>
> Worth noting that this patch does not address the problem with index
> relations when a TRUNCATE is used in the same transaction as its
> CREATE TABLE, take that for example when wal_level = minimal:
> 1) Run transaction
> =# begin;
> BEGIN
> =# create table ab (a int primary key);
> CREATE TABLE
> =# truncate ab;
> TRUNCATE TABLE
> =# commit;
> COMMIT
> 2) Restart server with immediate mode.
> 3) Failure:
> =# table ab;
> ERROR: XX001: could not read block 0 in file "base/16384/16388": read
> only 0 of 8192 bytes
> LOCATION: mdread, md.c:728
>
> The case where a COPY is issued after TRUNCATE is fixed though, so
> that's still an improvement.
>
> Here are other comments.
>
> + /* Flush updates to relations that we didn't WAL-logged */
> + smgrDoPendingSyncs(true);
> "Flush updates to relations there were not WAL-logged"?
>
> +void
> +FlushRelationBuffersWithoutRelCache(RelFileNode rnode, bool islocal)
> +{
> + FlushRelationBuffers_common(smgropen(rnode, InvalidBackendId), islocal);
> +}
> islocal is always set as false, I'd rather remove it this argument
> from FlushRelationBuffersWithoutRelCache.
>
> for (i = 0; i < nrels; i++)
> + {
> smgrclose(srels[i]);
> + }
> Looks like noise.
>
> + if (!found)
> + {
> + pending->truncated_to = InvalidBlockNumber;
> + pending->sync_above = nblocks;
> +
> + elog(DEBUG2, "registering new pending sync for rel %u/%u/%u at
> block %u",
> + rnode.spcNode, rnode.dbNode, rnode.relNode, nblocks);
> +
> + }
> + else if (pending->sync_above == InvalidBlockNumber)
> + {
> + elog(DEBUG2, "registering pending sync for rel %u/%u/%u at block %u",
> + rnode.spcNode, rnode.dbNode, rnode.relNode, nblocks);
> + pending->sync_above = nblocks;
> + }
> + else
> Here couldn't it be possible that when (sync_above !=
> InvalidBlockNumber), nblocks can be higher than sync_above? In which
> case we had better increase sync_above to nblocks, no?
>
> + if (!pendingSyncs)
> + createPendingSyncsHash();
> + pending = (PendingRelSync *) hash_search(pendingSyncs,
> + (void *) &rel->rd_node,
> + HASH_ENTER, &found);
> This is lacking comments.
>
> - if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
> + BufferGetTag(buffer, &rnode, &forknum, &blknum);
> + if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT) &&
> + !smgrIsSyncPending(rnode, blknum))
> Here as well explaining in more details why the buffer does not need
> to go through XLogSaveBufferForHint would be nice.

An additional one:
- XLogRegisterBuffer(0, newbuf, bufflags);
- if (oldbuf != newbuf)
- XLogRegisterBuffer(1, oldbuf, REGBUF_STANDARD);
In log_heap_update, the new buffer is now conditionally logged
depending on if the heap needs WAL or not.

Now during replay the following thing is done:
- oldaction = XLogReadBufferForRedo(record, (oldblk == newblk) ? 0 : 1,
- &obuffer);
+ if (oldblk == newblk)
+ oldaction = XLogReadBufferForRedo(record, 0, &obuffer);
+ else if (XLogRecHasBlockRef(record, 1))
+ oldaction = XLogReadBufferForRedo(record, 1, &obuffer);
+ else
+ oldaction = BLK_DONE;
Shouldn't we check for XLogRecHasBlockRef(record, 0) when the tuple is
updated on the same page?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-02-19 13:33:34 Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Previous Message Andres Freund 2016-02-19 12:18:00 Re: checkpointer continuous flushing - V16