fs issues on software raid0 (PG_VERSION does not contain valid data)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: fs issues on software raid0 (PG_VERSION does not contain valid data)
Date: 2015-10-18 18:25:29
Message-ID: 5623E419.5030109@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi there,

I've been doing a lot of filesystem testing / benchmarking recently, and
today I've ran into a really strange issue with ext4 on two SSD devices
in a RAID-0 configuration (Linux software raid).

I'm currently trying rerunning the test to see if it's reproducible, but
maybe someone has an idea of what might be the problem.

The issue demonstrates like this:

FATAL: "base/12140" is not a valid data directory
DETAIL: File "base/12140/PG_VERSION" does not contain valid data.
HINT: You might need to initdb.

The paths are obviously nonsense. But it gets funnier - the database
continues to run seemingly just fine (doing checkpoints, serving
queries, ...), until this happens:

ERROR: index "pg_type_oid_index" contains unexpected zero page
at block 3 at character 61

This happens after the benchmarking script runs vacuumdb:

vacuumdb: query failed: ERROR: index "pg_type_oid_index" contains
unexpected zero page at block 3
LINE 1: ...LECT datname FROM pg_database WHERE datallowconn ORDER BY 1

Attached is a PostgreSQL log for the whole benchmark run, log tracking
the benchmark script (useful for mapping the pg.log to steps of the
benchmark), and also log with mdadm info.

, which initializes a new cluster and then does this:

1) run on small dataset (scale=10)
- pgbench init
- vacuumdb
- warmup
- pgbench runs for various client runs (with explicit checkpoints)

2) run on large data set (scale=1100)
- ... same as for (1)

3) run on medium data set (scale=140)
- ... same as for (1)

(The data set sizes are for a machine with 8GB of RAM.)

Anyway, the (1) completes without any errors, then while doing warmup
for (2) the "not a valid data directory" errors start to pop up, and
finally when (3) attempts to do the vacuumdb, it fails because of the
zero page in pg_type_oid_index.

All this happens on an ext4 filesystem, created on a sw raid0 manager by
kernel 4.0.4. The filesystem is created like this:

mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1
mkfs.ext4 -E stride=128,stripe-width=256 /dev/md0

and mounted like this

/dev/md0 on /mnt/data type ext4 (rw,noatime,nobarrier,discard)

Neither the array nor the filesystem are corrupted in any way, and
there's no sign of kernel issues in any of the logs (/var/log/messages
or dmesg, for example).

Also, I've done a number of tests with ext4 with exactly the same mount
options, but placed directly on a single device (thus not going through
the sw raid layers), and none of those had this issue.

So it seems to me that either the sw raid somehow breaks the guarantees
we expect from ext4, or something like that. Another possibility is that
using two devices introduces some sort of race condition somewhere in
the stack. Or maybe it's not safe to use nobarrier in this case, I don't
know.

Now, I don't really think people should use software raid in cases when
data durability matters, but I'm not sure that's where the problem is.

I've found two threads that might be somewhat related:

1)
http://www.postgresql.org/message-id/201002200230.16951.andres@anarazel.de

- Same error message, but I don't see any conclusion except for
"cannot happen" from Greg.

2) http://www.postgresql.org/message-id/48331F9F.9030508@demabg.com

- Essentially talks about failed RAID5 array, but that does not seem
to be the case here (no RAID failures here).

BTW this was done on PostgreSQL 9.4.x.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
pg.log.gz application/gzip 10.0 KB
mdadm.log text/plain 1.8 KB
bench.log text/plain 4.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-10-18 18:37:18 Re: fs issues on software raid0 (PG_VERSION does not contain valid data)
Previous Message Amit Kapila 2015-10-18 03:14:22 Re: Parallel Seq Scan