Quick Links

Re: PG 7.2.4 and waiting backends

From:	Ericson Smith <eric(at)did-it(dot)com>
To:	Rob Fielding <rob(at)dsvr(dot)net>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: PG 7.2.4 and waiting backends
Date:	2004-03-30 20:04:32
Message-ID:	4069D2D0.1010600@did-it.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Why not just COPY the data to disk, truncate the table, then COPY it
back in. We do that with some pretty large tables as well, where vacuum
full is not an option.

Think of it as having a short and predictable downtime, instead of a
very long downtime when vacuum full is running.

Rob Fielding wrote:

> Hi,
>
> Currently experiencing a problem, apparently triggered by vacuumdb -a
> -f which results in all backends (including the vacuum) becoming
> locked with a status of 'waiting'.
>
> The intention is to migrate the DB from 7.2 to a later version. The
> interim has been to get up to the latest 7.2.4 release inorder to
> circumvent a vacuumdb corner case which was resulting in a FULL
> failing. The upgrade to 7.2.4 resolved this only to present a new
> issue. That after a significant runtime (300 minutes) all backends
> processing selects, inserts and even the vacuum itself get stuck
> 'waiting'. The result is that all backends become stuck and we run out
> of backends.
>
> The reason why we have a need to perform a full vacuum is that a
> problem with one of our own subsystems handling data inserts caused
> the db size to rise from 25GB to 70GB. Unfortunately the nature of the
> data means duplicates are a valid case. We have issued a script which
> hunts and removes this additional data and this has completed, but pg
> wont free this space back to the filesystem until a full vacuum is
> performed. I'd like to see where we are up by looking at the real
> space taken up by the system. After 5 hours of running vacuum full,
> before the freeze, the db is 68GB, so we either still have lots of
> bogus dupes or we are not anywhere near completing the vacuum.
>
> The system is live, so I don't have too much to go on right now as
> I've only tried the full vacuum 3 times due to the long runtime and
> the effects being quite catastrophic if not caught at the right time.
> I'm at the first stage of attempting to diagnose the problem and
> understand the details I've provided aren't particularly scientific,
> but I'd like to try and gain some further insight.
>
> The final logs follow. Notice that prior to the problem, the same
> pg_toast_64432 object is being processed. I'm uncertain of the
> significance of this as I don't fully understand the log output.
> Interestingly we suddenly run alot of log recycling right after this
> point (see checkpoint_segments setting). It is after this point we
> stop. At which point I manually perform a fast shutdown, as indicated
> in the logs after my snip. Resuming the database is fine.
>
>
> 2004-03-29 13:50:58 [14147] NOTICE: --Relation pg_toast_64432--
> 2004-03-29 13:50:59 [14147] NOTICE: Pages 975: Changed 334, reaped
> 275, Empty 0, New 0; Tup 4562: Vac 0, Keep/VTL 630/0, UnUsed 338,
> MinLen 45, MaxLen 2034; Re-using: Free/Avail. Space 376416/372248;
> EndEmpty/Avail. Pages 0/673.
> CPU 0.06s/0.00u sec elapsed 1.33 sec.
> 2004-03-29 13:51:00 [14147] NOTICE: Index pg_toast_64432_idx: Pages
> 161; Tuples 4562: Deleted 0.
> CPU 0.01s/0.00u sec elapsed 0.76 sec.
> 2004-03-29 13:51:00 [14147] NOTICE: Rel pg_toast_64432: Pages: 975
> --> 974; Tuple(s) moved: 5.
> CPU 0.01s/0.05u sec elapsed 0.07 sec.
> 2004-03-29 13:51:00 [14147] NOTICE: Index pg_toast_64432_idx: Pages
> 161; Tuples 4562: Deleted 5.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> 2004-03-29 13:51:00 [14147] NOTICE: Analyzing mail_16
> 2004-03-29 13:56:42 [25752] DEBUG: recycled transaction log file
> 000000AB00000001
> 2004-03-29 13:56:42 [25752] DEBUG: recycled transaction log file
> 000000AB00000002
> 2004-03-29 13:56:42 [25752] DEBUG: recycled transaction log file
> 000000AB00000003
> 2004-03-29 13:56:42 [25752] DEBUG: recycled transaction log file
> 000000AB00000004
> <snip total of 32 recycled logs>
> 2004-03-29 13:58:30 [17415] ERROR: Query was cancelled.
> 2004-03-29 13:59:19 [23878] DEBUG: fast shutdown request
> 2004-03-29 13:59:19 [23878] DEBUG: aborting any active transactions
>
> next log:
>
> 2004-03-29 17:42:12 [26480] DEBUG: --Relation pg_toast_64432--
> 2004-03-29 17:42:13 [26480] DEBUG: Pages 984: Changed 10, reaped
> 477, Empty 0, New 0; Tup 3977: Vac 630, Keep/VTL 0/0, UnUsed 338,
> MinLen 45, MaxLen 2034; Re-using: Free/Avail. Space 1465424/1462296;
> EndEmpty/Avail. Pages 0/739.
> CPU 0.05s/0.00u sec elapsed 1.62 sec.
> 2004-03-29 17:42:14 [26480] DEBUG: Index pg_toast_64432_idx: Pages
> 161; Tuples 3977: Deleted 630.
> CPU 0.00s/0.00u sec elapsed 0.58 sec.
> 2004-03-29 17:42:14 [26480] DEBUG: Rel pg_toast_64432: Pages: 984
> --> 841; Tuple(s) moved: 542.
> CPU 0.04s/0.13u sec elapsed 0.20 sec.
> 2004-03-29 17:42:14 [26480] DEBUG: Index pg_toast_64432_idx: Pages
> 163; Tuples 3977: Deleted 542.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> 2004-03-29 17:51:44 [16371] DEBUG: recycled transaction log file
> 000000AD0000009E
> 2004-03-29 17:51:44 [16371] DEBUG: recycled transaction log file
> 000000AD0000009F
> 2004-03-29 17:51:44 [16371] DEBUG: recycled transaction log file
> 000000AD000000A0
> <snip total of 32 recycled logs>
> 2004-03-29 18:32:25 [26346] DEBUG: fast shutdown request
> 2004-03-29 18:32:25 [26346] DEBUG: aborting any active transactions
>
> Pertinent config :
>
> max_fsm_relations = 1000
> max_fsm_pages = 20000
> vacuum_mem = 65536
> effective_cache_size = 95694
> random_page_cost = 2
> sort_mem=65536
> max_connections = 128
> shared_buffers = 15732
> wal_buffers = 64 # need to determin
> wal_files = 64 # range 0-64
> wal_sync_method = fsync # the default varies across platforms:
> wal_debug = 0 # range 0-16
>
>
> # hopefully this should see less LogFlushes per LogInsert - use more
> WAL though.
> commit_delay = 10000 # range 0-100000
> commit_siblings = 2 # range 1-1000
>
>
> checkpoint_segments = 32 # in logfile segments (16MB each), min 1
> checkpoint_timeout = 600 # in seconds, range 30-3600
> fsync = false # scotty!
> #fsync = true
>
> Prior to this new incident, I've been successfully playing around with
> the WAL set-up thanks to guys on the performance list (see post in
> performance titled "[PERFORM] WAL Optimisation - configuration and
> usage"). I'm still experiementing with good settings for
> checkpoint_seg and timeout however the significance of recycling and
> checkpoint_segments worries me slightly. I shall try lower values in
> the mean time.
>
> Best regards,
>

Attachment	Content-Type	Size
eric.vcf	text/x-vcard	315 bytes

In response to

PG 7.2.4 and waiting backends at 2004-03-30 11:43:24 from Rob Fielding

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Juan Miguel	2004-03-30 20:32:44	Deleting a database
Previous Message	Jim Seymour	2004-03-30 20:03:11	Problems Vacuum'ing