Re: vacuum locking

From: Rob Nagler <nagler(at)bivio(dot)biz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: vacuum locking
Date: 2003-10-22 23:32:12
Message-ID: 16279.4988.496000.539525@gargle.gargle.HOWL
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Vivek Khera writes:
> AMI or Adaptec based?

Adaptec, I think. AIC-7899 LVD SCSI is what dmidecode says, and
Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it
boots. I haven't be able to use the aac utilities with this driver,
however, so it's hard to interrogate the device.

> If AMI, make sure it has write-back cache enabled (and you have
> battery backup!), and disable the 'readahead' feature if you can.

I can't do this so easily. It's at a colo, and it's production.
I doubt this has anything to do with this problem, anyway. We're
talking about hundreds of megabytes of data.

> What's the disk utilization proir to running vacuum? If it is
> hovering around 95% or more of capacity, of course you're gonna
> overwhelm it.

Here's the vmstat 5 at a random time:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 272372 38416 78220 375048 0 3 2 0 0 0 2 2 0
0 0 0 272372 30000 78320 375660 0 0 34 274 382 284 5 1 94
0 1 0 272372 23012 78372 375924 0 0 25 558 445 488 8 2 90
1 0 0 272368 22744 78472 376192 0 6 125 594 364 664 9 3 88

And here's it during vacuum:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 2 1 277292 9620 72028 409664 46 32 4934 4812 1697 966 8 4 88
0 3 0 277272 9588 72096 412964 61 0 7303 2478 1391 976 3 3 94
2 2 0 277336 9644 72136 393264 1326 32 2827 2954 1693 1519 8 3 89
The pages are growing proportionately with the number of tuples, btw.
Here's a vacuum snippet from a few days ago after a clean import,
running every 15 minutes:

INFO: Removed 2192 tuples in 275 pages.
CPU 0.06s/0.01u sec elapsed 0.91 sec.
INFO: Pages 24458: Changed 260, Empty 0; Tup 1029223: Vac 2192, Keep 3876, UnUsed 26.
Total CPU 2.91s/2.22u sec elapsed 65.74 sec.

And here's the latest today, running every 2 hours:

INFO: Removed 28740 tuples in 1548 pages.
CPU 0.08s/0.06u sec elapsed 3.73 sec.
INFO: Pages 27277: Changed 367, Empty 0; Tup 1114178: Vac 28740, Keep 1502, UnUsed 10631.
Total CPU 4.78s/4.09u sec elapsed 258.10 sec.

The big tables/indexes are taking longer, but it's a big CPU/elapsed
time savings to vacuum every two hours vs every 15 minutes.

There's still the problem that when vacuum is running interactive
performance drops dramatically. A query that takes a couple of
seconds to run when the db isn't being vacuumed will take minutes when
vacuum is running. It's tough for me to correlate exactly, but I
suspect that while postgres is vacuuming an index or table, nothing else
runs. In between relations, other stuff gets to run, and then vacuum
hogs all the resources again. This could be for disk reasons or
simply because postgres locks the index or table while it is being
vacuumed. Either way, the behavior is unacceptable. Users shouldn't
have to wait minutes while the database picks up after itself.

The concept of vacuuming seems to be problematic. I'm not sure why
the database simply can't garbage collect incrementally. AGC is very
tricky, especially AGC that involves gigabytes of data on disk.
Incremental garbage collection seems to be what other databases do,
and it's been my experience that other databases don't have the type
of unpredictable behavior I'm seeing with Postgres. I'd rather the
database be a little bit slower on average than have to figure out the
best time to inconvenience my users.

Since my customer already has Oracle, we'll be running tests in the
coming month(s :-) with Oracle to see how it performs under the same
load and hardware. I'll keep this group posted.

Rob

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-10-23 01:27:47 Re: vacuum locking
Previous Message Josh Berkus 2003-10-22 21:42:56 Re: slow select