Re: VACUUM FULL takes long time to complete

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: "Eric B(dot) Ridge" <ebr(at)tcdi(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: VACUUM FULL takes long time to complete
Date: 2007-03-31 21:25:13
Message-ID: 20070331172513.5021aa69.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Eric B. Ridge" <ebr(at)tcdi(dot)com> wrote:
>
> Hi! We've got a Postgres 8.1.5 installation with a 60GBish database:
>
> =# select version();
> version
> ------------------------------------------------------------------------
> ---------------
> PostgreSQL 8.1.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.1 (Gentoo 4.1.1)
> (1 row)
>
> =# select pg_size_pretty(pg_database_size('xxx'));
> pg_size_pretty
> ----------------
> 59 GB
>
> (please don't freak over the word Gentoo! This is Postgres compiled,
> by us, using the sources from postgresql.org, not some patched-up
> Gentoo version)
>
> Anyways, we run a VACUUM FULL ANALYZE VERBOSE every night, and it's
> now taking 3+ hours to complete. We also have autovacuum turned on
> with default settings. This 3+ hour vacuum time is cutting into our
> production hours. :(

Have you established any reason to run vacuum full at all? Generally
speaking, vacuum full isn't necessary, and overall isn't a good idea.

If you do have a good reason for running it (which I'd be interested
to hear) you could reduce the impact by breaking the job up.
Perhaps vacuum one table each night, so that if you have 30 tables,
each table will get vacuum fulled once a month, and each night's
job should only take about 5 minutes.

Note that if you _do_ run vacuum full, it's a good idea to reindex
afterward. Vacuum full seems to result in a lot of index bloat
(which, BTW, is a good reason not to use it).

> Note that while we're vacuuming Postgres is running in a single user
> mode and not listening. There's no other connections to the database.
>
> The hardware itself is a dual-core Intel 2gHz w/ 4G ram and adequate
> space in a hardware Raid 5 configuration. I realize Raid 5 isn't
> ideal, but in the general use cases of our database it doesn't
> noticeably impact performance.

Grab some systems data while the process is running. A few snapshots
of top output, iostat, + anything you can think of to narrow down
where the bottleneck is. However, with a 65G database and 4G of
RAM to work with, I would expect that IO is going to be the bottleneck,
and that comes back to using something faster than RAID 5.

> How can we begin to cut down the vacuum time? My first thought is
> simply change the schedule to run weekly (or biweekly) since we're
> also running autovacuum. Are there any other Postgres configuration
> changes that might help to improve vacuum performance?
>
> The settings we've explicitly set are:
>
> max_connections = 256
> shared_buffers = 40000

I use ~250000 shared_buffers on our large memory systems. I don't
have conclusive evidence that going that high is really worth it,
but I haven't had any performance problems with it. Note that with
a DB as large as yours, this may cause problems with the memory
available for work_mem, but you'll have to test your use case to
know for sure.

> temp_buffers = 5000
> work_mem = 32768
> maintenance_work_mem = 65535
> max_fsm_pages = 120000
> fsync = on
> wal_buffers = 16
> effective_cache_size = 5000
> log_connections = on
> log_duration = off
> log_line_prefix = '%m [xid=%x] [%p]: '
> stats_start_collector = on
> stats_command_string = on
> stats_block_level = on
> stats_row_level = on
> autovacuum = on

--
Bill Moran
http://www.potentialtech.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Brian A. Seklecki 2007-03-31 21:30:51 Re: Instalation problem
Previous Message Martijn van Oosterhout 2007-03-31 21:15:40 Re: VACUUM FULL takes long time to complete