Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Harold A(dot) Giménez <harold(dot)gimenez(at)gmail(dot)com>
Subject: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Date: 2012-07-18 12:26:02
Message-ID: CA+TgmoYTv0J6QMwurM8kR6gR_OfDZ5vkw2wsQE0e+5-Oqf3A5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, Jul 17, 2012 at 6:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So I went to fix this in the obvious way (attached), but while testing
> it I found that the number of buffers_backend events reported during
> a regression test run barely changed; which surprised the heck out of
> me, so I dug deeper. The cause turns out to be extremely scary:
> ForwardFsyncRequest isn't getting called at all in the bgwriter process,
> because the bgwriter process has a pendingOpsTable. So it just queues
> its fsync requests locally, and then never acts on them, since it never
> runs any checkpoints anymore.

:-(

> This implies that nobody has done pull-the-plug testing on either HEAD
> or 9.2 since the checkpointer split went in (2011-11-01), because even
> a modicum of such testing would surely have shown that we're failing to
> fsync a significant fraction of our write traffic.
>
> Furthermore, I would say that any performance testing done since then,
> if it wasn't looking at purely read-only scenarios, isn't worth the
> electrons it's written on. In particular, any performance gain that
> anybody might have attributed to the checkpointer splitup is very
> probably hogwash.

I don't think anybody thought that was going to result in a direct
performance gain, but I agree the performance testing needs to be
redone. I suspect that the impact on my testing is limited, because I
do mostly pgbench testing, and the lost fsync requests were probably
duplicated by non-lost fsync requests from backend writes. But I
agree that it needs to be redone once this is fixed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-07-18 12:40:57 Re: CompactCheckpointerRequestQueue versus pad bytes
Previous Message Marko Kreen 2012-07-18 10:13:17 [9.1] 2 bugs with extensions

Browse pgsql-performance by date

  From Date Subject
Next Message Campbell, Lance 2012-07-18 14:27:16 monitoring suggestions
Previous Message John Lister 2012-07-18 12:18:25 postgresql query cost values/estimates