Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Harold A(dot) Giménez <harold(dot)gimenez(at)gmail(dot)com>
Subject: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Date: 2012-07-18 21:45:08
Message-ID: CA+U5nM+GTCjEcpSvb=L-=bZ+6ngnhCsRmVuXEYih_4q2CcMcBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 17 July 2012 23:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> BTW, while we are on the subject: hasn't this split completely broken
>>> the statistics about backend-initiated writes?
>
>> Yes, it seems to have done just that.
>
> So I went to fix this in the obvious way (attached), but while testing
> it I found that the number of buffers_backend events reported during
> a regression test run barely changed; which surprised the heck out of
> me, so I dug deeper. The cause turns out to be extremely scary:
> ForwardFsyncRequest isn't getting called at all in the bgwriter process,
> because the bgwriter process has a pendingOpsTable. So it just queues
> its fsync requests locally, and then never acts on them, since it never
> runs any checkpoints anymore.
>
> This implies that nobody has done pull-the-plug testing on either HEAD
> or 9.2 since the checkpointer split went in (2011-11-01), because even
> a modicum of such testing would surely have shown that we're failing to
> fsync a significant fraction of our write traffic.

That problem was reported to me on list some time ago, and I made note
to fix that after last CF.

I added a note to 9.2 open items about it myself, but it appears my
fix was too simple and fixed only the reported problem not the
underlying issue. Reading your patch gave me strong deja vu, so not
sure what happened there.

Not very good from me. Feel free to thwack me to fix such things if I
seem not to respond quickly enough.

I'm now looking at the other open items in my area.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Samuel Vogel 2012-07-18 21:48:05 Re: b-tree index search algorithms
Previous Message Tom Lane 2012-07-18 21:37:33 Re: bgwriter, regression tests, and default shared_buffers settings

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-07-18 22:36:14 Re: optimizing queries using IN and EXISTS
Previous Message Tom Lane 2012-07-18 21:17:26 Re: [PERFORM] DELETE vs TRUNCATE explanation