Re: Load Distributed Checkpoints, take 3

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Load Distributed Checkpoints, take 3
Date: 2007-06-23 08:59:27
Message-ID: Pine.GSO.4.64.0706221659310.6983@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

This message is going to come off as kind of angry, and I hope you don't
take that personally. I'm very frustrated with this whole area right now
but am unable to do anything to improve that situation.

On Fri, 22 Jun 2007, Tom Lane wrote:

> If you've got specific evidence why any of these things need to be
> parameterized, let's see it.

All I'm trying to suggest here is that you might want to pause and
consider whether you want to make a change that might break existing,
happily working installations based just on the small number of tests that
have been done on this patch so far. A nice stack of DBT2 results is very
informative, but the DBT2 workload is not everybody's workload.

Did you see anybody else predicting issues with the LDC patch on
overloaded systems as are starting to be seen in the 150 warehouse/90%
latency figures in Heikki's most recent results? The way I remember that,
it was just me pushing to expose that problem, because I knew it was there
from my unfortunately private tests, but it was difficult to encounter the
issue on other types of benchmarks (thanks again to Greg Stark and Heikki
for helping with that). But that's fine, if you want to blow off the rest
of my suggestions now just because the other things I'm worried about are
also very hard problem to expose and I can't hand you over a smoking gun,
that's your decision.

> Personally I think that we have a bad track record of exposing GUC
> variables as a substitute for understanding performance issues at the
> start, and this approach isn't doing any favors for DBAs.

I think this project has an awful track record of introducing new GUC
variables and never having a plan to follow through with a process to
figure out how they should be set. The almost complete lack of
standardization and useful tools for collecting performance information
about this database boggles my mind, and you're never going to get the
performance related sections of the GUC streamlined without it.

We were just talking about the mess that is effective_cache_size recently.
As a more topical example here, the background writer was officially
released in early 2005, with a bizarre collection of tunables. I had to
help hack on that code myself, over two years later, to even start
exposing the internal statistics data needed to optimize it correctly.
The main reason I can't prove some of my concerns is that I got so
side-tracked adding the infrastructure needed to even show they exist that
I wasn't able to nail down exactly what was going on well enough to
generate a public test case before the project that exposed the issues
wrapped up.

> Right at the moment the best thing to do seems to be to enable LDC with
> a low minimum write rate and a high target duration, and remove the
> thereby-obsoleted "all buffers" scan of the existing bgwriter logic.

I have reason to believe there's a set of use cases where a more
accelerated LDC approach than everyone seems to be learning toward is
appropriate, which would then reinvigorate the need for the all-scan BGW
component. I have a whole new design for the non-LRU background writer
that fixes most of what's wrong with it I'm waiting for 8.4 to pass out
and get feedback on, but if everybody is hell bent on just yanking the
whole thing out in preference to these really lazy checkpoints go ahead
and do what you want. My life would be easier if I just tossed all that
out and forgot about the whole thing, and I'm real close to doing just
that right now.

>> Did anyone else ever notice that when a new xlog segment is created,
>> the write to clear it out doesn't happen via direct I/O like the rest
>> of the xlog writes do?
> It's not supposed to matter, because that path isn't supposed to be
> taken often.

Yes, but during the situation it does happen in--when checkpoints take so
much longer than expected that more segments have to be created, or in an
archive logger faiure--it badly impacts an already unpleasant situation.

>> there's a whole class of issues involving recycling xlog segments this
>> would introduce I would be really unhappy with the implications of.
> Really? Name one.

You already mentioned expansion of the log segments used which is a
primary issue. Acting like all the additional segments used for some of
the more extreme checkpoint spreading approaches are without cost is
completely unrealistic IMHO. In the situation I just described above, I
also noticed the way O_DIRECT sync writes get mixed with buffered WAL
writes seems to cause some weird I/O scheduling issues in Linux that can
make worst-case latency degrade. But since I can't prove that, I guess I
might as well not even mention that either.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Magnus Hagander 2007-06-23 12:53:03 Re: Preliminary GSSAPI Patches
Previous Message Magnus Hagander 2007-06-23 08:44:38 Re: Preliminary GSSAPI Patches