Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pitrtools(at)lists(dot)commandprompt(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
Date: 2010-12-30 00:32:01
Message-ID: 4D1BD301.5090304@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have been seeing these warnings recently whenever a standby is
brought up (typically to check it is ok). Sometimes they are coupled
with corrupted indexes which require a REINDEX to fix. Initially I
thought these uninitialized pages were due to primary crashes or
hardware issues, however I've now managed to come up with a recipe to
generate them on demand on my workstation.

Pitrtools appears to be an essential part of the recipe - at this stage
I'm not sure if it is actually doing something directly to cause this or
merely tickling some Postgres recovery bug.

The essential triggering element seems to be performing a base backup
while the system is busy. Here's the description:

1/ Patch 8.3's pgbench using the attached diff, and initialize scale 100
dataset
2/ Get Pitrtools primary and standby config's setup (examples attached)
3/ Start pgbench with at least 4 clients and 200000 transactions
4/ After history has approx 10000 rows initiate backup from the standby
5/ After history has approx 140000 rows bring up the standby and perform
a VACUUM

Typically I'm seeing a large number of consecutive uninitialized pages
in the accounts table. What is also very interesting is that if I setup
the standby in a more "bare bones" manner (i.e manually running
pg_start_backup and rsync + pg_standby) then I can *never* elicit any
uninitialized pages.

I'm frankly puzzled about what Pitrtools is doing that is different - I
only noticed it using rsync compression (-z) and doing rsync backups via
pulling from the standby rather than pushing from the primary (I'm in
the process of trying these variations out in the bare bones case). Just
as I'm writing this I see Pitrtools rsync's pg_xlog - I wonder if there
is/are timing issues which mean that recovery might use some (corrupted)
logs from there before the (clean) archived ones arrive (will check).

Some more detail about the system:

Postgres 8.3.12 on Ubuntu Lucid x86_64 and Debian Lenny (lxc guests),
rsync 3, Pitrtools 1.2-1

Postgres config changes:

autovacuum = off # prevent any relation truncation
max_fsm_pages = 20000 # encourage new page creation

Pitrtools Steps:

primary:
$ grep archive_command postgresql.conf
archive_command = 'cmd_archiver -C /etc/pitrtools/cmd_archiver.ini -F %p'

standby:
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -B
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -Astop_basebackup
$ cp /etc/postgresql/8.3/main/pg_hba.conf \
/var/lib/postgresql/8.3/main/pg_hba.conf
$ cp /etc/postgresql/8.3/main/postgresql.conf \
/var/lib/postgresql/8.3/main/postgresql.conf
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -S
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -F999

Bare Bones Steps:

primary:
$ grep archive_command postgresql.conf
archive_command = 'rsync %p standby:/var/lib/postgresql/archive'

$ psql -c "SELECT pg_start_backup('backup');"
$ rsync --exclude pg_xlog/\* --exclude postmaster.pid -a * \
standby:/var/lib/postgresql/8.3/main
$ psql -c "SELECT pg_stop_backup();

standby:
$ grep restore_command recovery.conf
restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -t
/tmp/trigger.5432 /var/lib/postgresql/archive %f %p %r'
$ /etc/init.d/postgresql-8.3 start
$ touch /tmp/trigger.5432

regards

Mark

P.s: cc'ing Pg Hackers as variation of this topic has come up there
several times.

Attachment Content-Type Size
pgbench.diff.gz application/x-gzip 1.1 KB
cmd_archiver.ini.gz application/x-gzip 508 bytes
cmd_standby.ini.gz application/x-gzip 710 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-12-30 00:34:36 Re: sepgsql contrib module
Previous Message Noah Misch 2010-12-30 00:27:22 Re: Avoiding rewrite in ALTER TABLE ALTER TYPE