Re: atrocious update performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Rosser Schwarz" <rschwarz(at)totalcardinc(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: atrocious update performance
Date: 2004-03-17 19:48:31
Message-ID: 27247.1079552911@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Rosser Schwarz" <rschwarz(at)totalcardinc(dot)com> writes:
> while you weren't looking, Tom Lane wrote:
>> Have you got any idea what conditions may have changed between seeing
>> delay and not seeing delay?

> None, offhand. I have noticed that when a large query is running,
> the machine can sporadically just freeze--or at least take inordinately
> long for some other process, be it top or ls, another query, or whatever
> to start. Nothing looks saturated when it happens, and, while you can
> count on it to happen, it's not consistent enough to reproduce.

Interesting. You should leave "vmstat 1" running in the background and
see if you can correlate these freezes with bursts of disk I/O or swap.
I saw a couple of delays in your big strace that seemed odd --- a couple
of one-second-plus intervals, and a four-second-plus interval, with no
obvious reason for them. Perhaps the same issue?

> Does the fact that all the reads and writes are 32K mean anything out
> of the ordinary? $PGSRC/src/include/pg_config_manual.h has BLCKSZ
> #defined to 16384. I was running previously with a 32K BLCKSZ, but
> that turned out to be rather sub-optimal for as heavily indexed as our
> tables are. I've dumped and rebuilt several times since then.

I hate to break it to you, but that most definitely means you are
running with BLCKSZ = 32K. Whatever you thought you were rebuilding
didn't take effect.

I agree that the larger blocksize is of dubious value. People used to
do that back when the blocksize limited your row width, but these days
I think you're probably best off with the standard 8K.

Another thing that's fairly striking is the huge bursts of WAL activity
--- your trace shows that the thing is writing entire WAL segments (16
MB) at one go, rather than dribbling it out a page or two at a time as
the code is intended to do. I think what is happening is that you have
wal_buffers = 1024 (correct?) yielding 32MB of WAL buffers, and since
there are no other transactions happening, nothing gets written until
you hit the "write when the buffers are half full" heuristic. I would
suggest knocking wal_buffers down to something closer to the default
(maybe 4 or 8 buffers) to reduce these I/O storms. (Memo to hackers:
we need to see to it that the new background writer process takes some
responsibility for pushing out filled WAL buffers, not only data
buffers.)

If the big EXPLAIN ANALYZE is still running, would you get a dump of its
open files (see "lsof -p") and correlate those with the tables being
used in the query? I'm trying to figure out what the different writes
and reads represent.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rosser Schwarz 2004-03-17 20:34:07 Re: atrocious update performance
Previous Message Joe Conway 2004-03-17 19:38:54 Re: rapid degradation after postmaster restart