Re: Speed up Clog Access by increasing CLOG buffers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2017-03-10 06:13:40
Message-ID: CAA4eK1KAteYXb-KRY=tBRcM=D20o5UvgHePxpwRaBS7eqrkBaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Mar 10, 2017 at 10:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Buildfarm thinks eight wasn't enough.
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam&dt=2017-03-10%2002%3A00%3A01
>
>> At first I was confused how you knew that this was the fault of this
>> patch, but this seems like a pretty indicator:
>> TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
>> 0x00) || curval == status)", File: "clog.c", Line: 574)
>
> Yeah, that's what led me to blame the clog-group-update patch.
>
>> I'm not sure whether it's related to this problem or not, but now that
>> I look at it, this (preexisting) comment looks like entirely wishful
>> thinking:
>> * If we update more than one xid on this page while it is being written
>> * out, we might find that some of the bits go to disk and others don't.
>> * If we are updating commits on the page with the top-level xid that
>> * could break atomicity, so we subcommit the subxids first before we mark
>> * the top-level commit.
>
> Maybe, but that comment dates to 2008 according to git, and clam has
> been, er, happy as a clam up to now. My money is on a newly-introduced
> memory-access-ordering bug.
>
> Also, I see clam reported in green just now, so it's not 100%
> reproducible :-(
>

Just to let you know that I think I have figured out the reason of
failure. If we run the regressions with attached patch, it will make
the regression tests fail consistently in same way. The patch just
makes all transaction status updates to go via group clog update
mechanism. Now, the reason of the problem is that the patch has
relied on XidCache in PGPROC for subtransactions when they are not
overflowed which is okay for Commits, but not for Rollback to
Savepoint and Rollback. For Rollback to Savepoint, we just pass the
particular (sub)-transaction id to abort, but group mechanism will
abort all the sub-transactions in that top transaction to Rollback. I
am still analysing what could be the best way to fix this issue. I
think there could be multiple ways to fix this problem. One way is
that we can advertise the fact that the status update for transaction
involves subtransactions and then we can use xidcache for actually
processing the status update. Second is advertise all the
subtransaction ids for which status needs to be update, but I am sure
that is not-at all efficient as that will cosume lot of memory. Last
resort could be that we don't use group clog update optimization when
transaction has sub-transactions.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
force_clog_group_commit_v1.patch application/octet-stream 943 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-03-10 06:21:28 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Tsunakawa, Takayuki 2017-03-10 06:05:25 Re: PATCH: Configurable file mode mask