Re: "error with invalid page header" while vacuuming pgbench data

From: John Rouillard <rouilj(at)renesys(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: "error with invalid page header" while vacuuming pgbench data
Date: 2011-05-25 22:07:16
Message-ID: 20110525220716.GF31823@renesys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 25, 2011 at 03:19:59PM -0500, Kevin Grittner wrote:
> John Rouillard <rouilj(at)renesys(dot)com> wrote:
> > On Mon, May 23, 2011 at 05:21:04PM -0500, Kevin Grittner wrote:
> >> John Rouillard <rouilj(at)renesys(dot)com> wrote:
> >>
> >> > I seem to be able to provoke this error:
> >> >
> >> > vacuum...ERROR: invalid page header in
> >> > block 2128910 of relation base/16385/21476
> >>
> >> What version of PostgreSQL?
> >
> > Hmm, I thought I replied to this, but I haven't seen it come back
> > to me on list. It's postgres version: 8.4.5.
> >
> > rpm -q shows
> >
> > postgresql84-server-8.4.5-1.el5_5.1
>
> I was hoping someone else would jump in, but I see that your
> previous post didn't copy the list, which solves *that* mystery.
>
> I'm curious whether you might have enabled one of the "it's OK to
> trash my database integrity to boost performance" options. (People
> with enough replication often feel that this *is* OK.) Please run
> the query on this page and post the results:
>
> http://wiki.postgresql.org/wiki/Server_Configuration
>
> Basically, if fsync or full_page_writes is turned off and there was
> a crash, that explains it. If not, it provides more information to
> proceed.

Nope. Neither is turned off. I can't run the query at the moment since
the system is in the middle of a memtest86+ check of 96GB of
memory. The relevent parts from the config file from the Configuration
Management system are:

#fsync = on # turns forced synchronization
# on or off
#synchronous_commit = on # immediate fsync at commit
#wal_sync_method = fsync # the default is the first option

#full_page_writes = on # recover from partial page writes

this is the same setup I use on all my data warehouse systems (with
minor pgtune type changes based on amount of memory). Running the
query on another system (using ext3, centos 5.5) shows:

version | PostgreSQL 8.4.5 on
x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-48), 64-bit
archive_command | if test ! -e
/var/lib/pgsql/data/ARCHIVE_ENABLED; then exit 0; fi; test ! -f
/var/bak/pgsql/%f && cp %p /var/bak/p
gsql/%f
archive_mode | on
checkpoint_completion_target | 0.9
checkpoint_segments | 64
constraint_exclusion | on
custom_variable_classes | pg_stat_statements
default_statistics_target | 100
effective_cache_size | 8GB
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_checkpoints | on
log_connections | on
log_destination | stderr,syslog
log_directory | pg_log
log_filename | postgresql-%a.log
log_line_prefix | %t %u(at)%d(%p)i:
log_lock_waits | on
log_min_duration_statement | 2s
log_min_error_statement | warning
log_min_messages | notice
log_rotation_age | 1d
log_rotation_size | 0
log_temp_files | 0
log_truncate_on_rotation | on
logging_collector | on
maintenance_work_mem | 1GB
max_connections | 300
max_locks_per_transaction | 128
max_stack_depth | 2MB
port | 5432
server_encoding | UTF8
shared_buffers | 4GB
shared_preload_libraries | pg_stat_statements
superuser_reserved_connections | 3
tcp_keepalives_count | 0
tcp_keepalives_idle | 0
tcp_keepalives_interval | 0
TimeZone | UTC
wal_buffers | 32MB
work_mem | 16MB

> You might want to re-start the thread on pgsql-general, though. Not
> everybody who might be able to help with a problem like this follows
> the performance list. Or, if you didn't set any of the dangerous
> configuration options, this sounds like a bug -- so pgsql-bugs might
> be even better.

Well I am also managing to panic the kernel on some runs as well. So
my guess is this is not only a postgres bug (if it's a postgres issue
at all).

As gregg mentioned in another followup ext4 under centos 5.x may be an
issue. I'll drop back to ext3 and see if I can replicate the
corruption or crashes one I rule out some potential hardware issues.

If I can replicate with ext3, then I'll follow up on -general or
-bugs.

Ext4 pgbench results complete faster, but if it's not reliable ....

Thanks for your help.

--
-- rouilj

John Rouillard System Administrator
Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2011-05-25 22:08:51 Re: FW: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Previous Message Greg Smith 2011-05-25 20:41:16 Re: "error with invalid page header" while vacuuming pgbench data