Re: Vacuum full - disk space eaten by WAL logfiles

From: "Lee Wu" <Lwu(at)mxlogic(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Vacuum full - disk space eaten by WAL logfiles
Date: 2005-01-11 15:53:08
Message-ID: ECAB83AA52BCC043A0E24BBC000010241114E6@mxhq-exch.corp.mxlogic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Tom,

I re-send this because my last email with attachment did not get
through.

It is a live production database, and I do not know what else happened
on Saturdays because I do not work on Saturdays. However, I can ask our
shift to find out what other queries will be in the PG next Saturday if
it helps you/us. On the other hand, should not PG use fix number/amount
of LOG files like Oracle even though there may be other activities at
the same time?

From archive, I see a post
http://archives.postgresql.org/pgsql-hackers/2004-07/msg01055.php
It said,
> > You have 35 because the max files in pg_xlog is
> > 2*checkpoint_segments +1 or something like that. This is documented
> > in the SGML.

Here is postgres log file except I changed hostname and table names.

I do not know how to strace the checkpoint process and would appreciate
it
if you can show me:
1. how
2. when

Thanks,

Postgres.log:
Jan 8 20:15:52 mybox postgres[8037]: [73] LOG: recycled transaction
log file 00001AB100000060
all other recycling transaction log ...
Jan 8 20:15:52 mybox postgres[8037]: [74] LOG: removing transaction
log file 00001AB100000061
all other removing transaction log ...
Jan 8 20:17:27 mybox postgres[8213]: [13] LOG: recycled transaction
log file 00001AB2000000A3
all other recycling transaction log ...
Jan 8 20:17:42 mybox postgres[8213]: [74] LOG: removing transaction
log file 00001AB200000077
all other removing transaction log ...
Jan 8 20:25:33 mybox postgres[1602]: [13] PANIC: ZeroFill failed to
write /my/pg_xlog/xlogtemp.1602: No space left on device
Jan 8 20:25:35 mybox postgres[8213]: [163] LOG: removing transaction
log file 00001AB2000000EC
Jan 8 20:25:35 mybox postgres[1602]: [14-1] LOG: statement: COPY
table1 (domain, domain_id, customer_id, action_unspecified,
action_unknown,
Jan 8 20:25:35 mybox postgres[8213]: [164] LOG: removing transaction
log file 00001AB2000000ED
Jan 8 20:25:35 mybox postgres[8213]: [165] LOG: removing transaction
log file 00001AB2000000EE
Jan 8 20:25:35 mybox postgres[1602]: [14-2] action_none, action_deny,
action_fail, action_strip, action_tag, action_quarantine, action_clean,
action_copy, action_allow,
Jan 8 20:25:35 mybox postgres[8213]: [166] LOG: removing transaction
log file 00001AB2000000F0
Jan 8 20:25:35 mybox postgres[1602]: [14-3] module, period, created)
FROM stdin
Jan 8 20:25:35 mybox postgres[8213]: [167] LOG: removing transaction
log file 00001AB2000000F1
Jan 8 20:25:35 mybox postgres[8213]: [168] LOG: removing transaction
log file 00001AB2000000F2
Jan 8 20:25:36 mybox postgres[8213]: [169] LOG: removing transaction
log file 00001AB2000000F3
Jan 8 20:25:36 mybox postgres[8213]: [170] LOG: removing transaction
log file 00001AB2000000F4
Jan 8 20:25:36 mybox postgres[8213]: [171] LOG: removing transaction
log file 00001AB2000000F5
Jan 8 20:25:36 mybox postgres[862]: [13] LOG: server process (pid
1602) was terminated by signal 6
Jan 8 20:25:36 mybox postgres[862]: [14] LOG: terminating any other
active server processes
Jan 8 20:25:36 mybox postgres[8601]: [13-1] WARNING: copy: line 1,
Message from PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[8601]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[8601]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:36 mybox postgres[8601]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:36 mybox postgres[7284]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[3658]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[29685]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[1605]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[29687]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[1603]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[29684]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[1604]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[29686]: [13-1] WARNING: Message from
PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[8601]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:36 mybox postgres[7284]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[3658]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[29685]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[1605]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[29687]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[1603]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[29684]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:36 mybox postgres[1604]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:37 mybox postgres[29686]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan 8 20:25:37 mybox postgres[8601]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:37 mybox postgres[7284]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[3658]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[29685]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[1605]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[29687]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[1603]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[29684]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[1604]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[29686]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan 8 20:25:37 mybox postgres[8601]: [14-1] LOG: copy: line 1,
statement: COPY table1 (domain, domain_id, customer_id, email,
bytes_received, mails,
Jan 8 20:25:37 mybox postgres[7284]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[3658]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[29685]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[1605]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[29687]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[1603]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[29684]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[1604]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[29686]: [13-4] ^II have rolled back the
current transaction and am
Jan 8 20:25:37 mybox postgres[8601]: [14-2] period, created) FROM
stdin
Jan 8 20:25:37 mybox postgres[7284]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[3658]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[29685]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[1605]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[29687]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[1603]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[29684]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[1604]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[29686]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan 8 20:25:38 mybox postgres[8602]: [15] FATAL: The database system
is in recovery mode
Jan 8 20:25:38 mybox postgres[7284]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[3658]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[29685]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[1605]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[29687]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[1603]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[29684]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[1604]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[29686]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan 8 20:25:38 mybox postgres[8603]: [15] FATAL: The database system
is in recovery mode
Jan 8 20:25:38 mybox postgres[7284]: [14] LOG: statement: set
vacuum_mem=131072; vacuum full analyze the_35G_table
Jan 8 20:25:39 mybox postgres[8604]: [15] FATAL: The database system
is in recovery mode
All other The database system is in recovery mode...
Jan 8 20:25:41 mybox postgres[862]: [15] LOG: all server processes
terminated; reinitializing shared memory and semaphores
Jan 8 20:25:41 mybox postgres[9355]: [16] FATAL: The database system
is starting up
Jan 8 20:25:41 mybox postgres[9346]: [16] LOG: database system was
interrupted at 2005-01-08 20:25:32 MST
Jan 8 20:25:41 mybox postgres[9365]: [16] FATAL: The database system
is starting up
Jan 8 20:25:41 mybox postgres[9346]: [17] LOG: checkpoint record is at
1AB3/692F2358
Jan 8 20:25:41 mybox postgres[9378]: [16] FATAL: The database system
is starting up
Jan 8 20:25:41 mybox postgres[9346]: [18] LOG: redo record is at
1AB3/584199FC; undo record is at 0/0; shutdown FALSE
Jan 8 20:25:42 mybox postgres[9477]: [16] FATAL: The database system
is starting up
Jan 8 20:25:42 mybox postgres[9346]: [19] LOG: next transaction id:
85474993; next oid: 901857280
Jan 8 20:25:42 mybox postgres[9478]: [16] FATAL: The database system
is starting up
Jan 8 20:25:42 mybox postgres[9346]: [20] LOG: database system was not
properly shut down; automatic recovery in progress
Jan 8 20:25:42 mybox postgres[9479]: [16] FATAL: The database system
is starting up
Jan 8 20:25:42 mybox postgres[9346]: [21] LOG: redo starts at
1AB3/584199FC
Jan 8 20:25:42 mybox postgres[9481]: [16] FATAL: The database system
is starting up
All other The database system is starting up...
Jan 8 20:37:38 mybox postgres[9346]: [22] LOG: open of
/my/pg_xlog/00001AB500000030 (log file 6837, segment 48) failed: No such
file or directory
Jan 8 20:37:38 mybox postgres[9346]: [23] LOG: redo done at
1AB5/2FFFC1F8
Jan 8 20:37:45 mybox postgres[9346]: [24] LOG: recycled transaction
log file 00001AB300000021
All other recycling transaction log ...
Jan 8 20:37:48 mybox postgres[9346]: [85] LOG: removing transaction
log file 00001AB2000000FB
Jan 8 20:37:48 mybox postgres[9346]: [86] LOG: removing transaction
log file 00001AB2000000FC
Jan 8 20:37:48 mybox postgres[14114]: [16] FATAL: The database system
is starting up
Jan 8 20:37:48 mybox postgres[9346]: [87] LOG: removing transaction
log file 00001AB2000000FD
All other removing transaction log ...
Jan 8 20:43:28 mybox postgres[14419]: [484] LOG: removing transaction
log file 00001AB50000002E
Jan 8 20:48:38 mybox postgres[14547]: [16] LOG: recycled transaction
log file 00001AB500000031

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, January 10, 2005 4:01 PM
To: Lee Wu
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles

"Lee Wu" <Lwu(at)mxlogic(dot)com> writes:
> 3. It is "vacuum full" as shown:

Hmm. I'm not able to replicate any such problem here: AFAICS,
checkpoints go through to completion just fine while a VACUUM is
running. This might mean that the problem was fixed between 7.3.2
and 7.3.8 (though I see nothing relevant-looking in the changelog).
Or it might mean that you've not given enough information to let
someone else duplicate the problem. What else is going on in parallel
with the VACUUM? (We know at least some COPY operations...)

Again, it would be useful to know whether the checkpoint process is
making progress or not.

regards, tom lane

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2005-01-11 18:04:41 Re: Vacuum full - disk space eaten by WAL logfiles
Previous Message Tom Lane 2005-01-10 23:00:32 Re: Vacuum full - disk space eaten by WAL logfiles