Re: Slides for last night's talk

From: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
To: Tim Allen <tim(at)proximity(dot)com(dot)au>
Cc: sydpug(at)postgresql(dot)org
Subject: Re: Slides for last night's talk
Date: 2006-05-03 03:08:45
Message-ID: Pine.LNX.4.58.0605031236110.27597@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sydpug

On Wed, 3 May 2006, Tim Allen wrote:

> Gavin Sherry wrote:
> > You can fine my slides here:
> >
> > http://pugs.postgresql.org/sydpug/archives/000056.html
>
> Thanks for that, Gavin. There are a few points that seem new and
> interesting to me - wish I had been there for your explanation. Hope you
> don't mind a delayed Q&A session :-).
>
> Setting wal_buffers to 64 to 512 seems much higher than any previous
> recommendation I'd seen. I (just now) noticed Josh Berkus's blog entry
> where he did some testing that showed performance goes up with
> wal_buffers in the range you've just indicated. But the annotated conf

Right. We found reasonably recently that increasing wal_buffers has a good
effect -- mostly in 8.1.

> document just says that wal_buffers isn't all that important, it only
> needs to be big enough to hold a single transaction, and you're better
> off paying attention to checkpoint_segments. You haven't mentioned
> checkpoint_segments in your slides. Is there some way out of the
> apparent contradiction? I guess the annotated conf is still aimed at
> 8.0, perhaps it needs updating for 8.1? Does your experience back up
> Josh's recent testing? And do you have any particular recommendation for
> checkpoint_segments?

The annotated postgresql.conf needs updating -- as you say. Please hassle
Josh -- or even send in some of your own material if possible.

I didn't look at increasing checkpoint_segments. I meant to address it but
overlooked it. It is a little trickly because of the space consumptions.
For a dedicated system, I generally set this to 32.

>
> Setting wal_buffers higher has a cost in consumption of shared memory,
> right? So presumably you don't want to be too profligate with it.

Right. It does start to have a negative effect at after 512 -- but this is
a scaling thing. Even at 512, you're only using 4M of memory.

>
> On shared_buffers, your slide says "up to" 1/3 of physical memory. Did
> you say anything about how close to that upper limit it's worth going? I
> gather for 8.0 and earlier there wasn't much point setting
> shared_buffers too high, as the kernel's own cacheing was about as good
> as postgres's buffer cache, but that 8.1 made enough improvements to the
> shared buffer cache to possibly change that situation. Do you have a
> recommendation based on the changes in 8.1?

I generally tell people to do their own testing between 1/20th of physical
memory (what they probably have it at for < 8.1) and 1/3rd of memory.
Generally, I've had a lot of success setting it as close to 1/3 as
possible. You can set it higher but in my experience it scales back in
effectiveness.

> You've listed Opteron, Itanium II and Xeon on the same line for cpu
> recommendations, which presumably implies you think they're roughly
> equivalent. Lots of comments I've seen on the mailing lists indicate the
> Opteron does rather better than anything from Intel; is that not your
> experience?

I discussed this at length last night. With Opteron, you're getting more
bang for buck (because they're cheaper) but if you look at performance
alone for an OLTP workload (like TPC-C), Opteron is not way out in front.
In fact, Itanium II has better performance -- but it is 10 times more
expensive. Opteron is a good chip but I wouldn't rush out replacing
existing 1-2 year old systems with them thinking throughput will increase
more than 10-15%.

> On H/A, and the RedHat cluster solution with shared SAN (or shared SCSI
> array), we've had one disaster with that at one customer (corrupted
> database) which we guessed (but weren't able to prove) was due to the
> cluster manager failing to adequately ensure that only one postgres
> instance had the filesystem mounted at a time. We couldn't really get to
> the bottom of it, because the install was difficult for other reasons,
> and we weren't able to do any other testing. I've seen some mailing list
> postings from Alvaro H to the effect that he knows of a site in
> Venezuela that had the same problem. Because of that, I've tended to
> regard the RedHat cluster solution with some suspicion. But upon further
> searching, I haven't come across any other reports of failures, and have
> seen a few mentions in passing of people actually using it. What's your
> impression and/or experience? Can we trust RedHat's cluster management?

I can tell you erm... privately that two customers have had the same
problem. It gets things right most of the time but for some reason the
following systems to happen some times:

1) SAN has some kind of 'blip'. Basically, the controller stalls or
something for >10 seconds and writes are not performed.
2) Slaves do not see the heartbeat from the master -- which is done via
the SAN
3) The slave is promoted to a master
4) The master is not taken off line. This is a short coming anyway because
the master should be shot in the head via a network or power disconnect.
5) Now here's the weird bit. We have an interlock to guard against the
multiple postmaster scenario. First, we have a shared memory segment. If
it exists, we don't start. Obiously, this isn't effective in a cluster
environment. We then have a postmaster.pid. If this exists, we don't
proceed.

I'm sure I checked RHEL's init script to make sure it didn't do something
like:

if [ -f $PGDATA/postmaster.pid ]
then
pid=`head -1 $PGDATA/postmaster.pid`
if [ -f /proc/$pid ]
then
# already running
else
rm $PGDATA/postmaster.pid
# start
fi
fi

It might be worth seeing if this kind of pattern is used at your customer
site(s). Obviously, this breaks in the shared storage environment.

Thanks,

Gavin

In response to

Responses

Browse sydpug by date

  From Date Subject
Next Message Tim Allen 2006-05-04 06:25:21 Re: Slides for last night's talk
Previous Message Tim Allen 2006-05-03 02:23:32 Re: Slides for last night's talk