Multicolun index creation never completes on 9.0.1/solaris

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Multicolun index creation never completes on 9.0.1/solaris
Date: 2011-01-25 20:05:36
Message-ID: 4D3F2D10.1020905@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Note: I have this running now on a test box. If someone responds in the
next couple hours, I can run whatever diagnostics you want on it.
Otherwise I'll kill it off and start over with debug logging turned on.

Version: 9.0.1
Platform: Solaris 10u8 / ZFS over DAS 7-drive array
Severity: Serious
Reproduceable? Always, on this machine
Description:

I've been running tests for a client on their old production machine.
Part of the test setup involves restoring a 70GB table (part of a 400GB
database). Restoring this table always fails to complete (we terminate
it after 3 days).

Watching it closer this time, I've noticed that it's hanging on the
creation of a multicolumn index*:

"write_log_accounttime_idx" btree (account_id, event_time)

Table def*:

Table "public.write_log"
Column | Type | Modifiers
----------------+--------------------------+------------------------
log_id | character varying | not null
account_id | character varying |
write_id | character varying |
event_time | timestamp with time zone | not null default now()
event_loc | character varying(1024) |
source | character varying(1024) |
detail | character varying(1024) |
source_ip | inet |
event_length | integer |
keyword | character varying |

account_id is usually a 32-character hex string (a UUID).

When trying to restore this index ... or, indeed, do *anything* else
with in including ANALYZE or REINDEX ... takes longer than we're willing
to monitor in order to complete (I once waited 30 hours). (I was able
to test Analyze/reindex because the first time I converted the database
using pg_upgrade).

Interestingly, on the production database (where the index was
successfully created on 8.4.4., that index has no entries in the stats
table, i.e. pg_stat_user_indexes does not have a row for that index.
However, this database *did* restore on the same platform in 8.4.4.

The PK for this table restores successfully. I haven't been able to
test other single-column indexes.

Watching the process of creating this index, it's certainly busy:

read(224, " o02\0\0E0C9 ]D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 P\f ^D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 P88E7D101\0\0\0".., 8192) = 8192
read(224, " o02\0\010 . ^D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0B0 r ^D101\0\0\0".., 8192) = 8192
read(224, " o02\0\010 q ^D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0C8F9 ^D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 hBE aD101\0\0\0".., 8192) = 8192
read(224, " o02\0\0C8D6 ^D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 `F8 ^D101\0\0\0".., 8192) = 8192
write(219, " e\0\0\0\0\0\0\08082D4 V".., 8192) = 8192
read(224, " o02\0\08085 xD101\0\0\0".., 8192) = 8192
read(224, " o02\0\0E8 ] _D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 ( _ _D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0B080 _D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0F8A2 _D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 xB5EAD101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 @E4 _D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 \b `D101\0\0\0".., 8192) = 8192
read(224, " o02\0\090 ) `D101\0\0\0".., 8192) = 8192
write(219, "\0 OA197E1 !01\0 ,\0\0\0".., 8192) = 8192
read(224, " o02\0\0B0 K `D101\0\0\0".., 8192) = 8192
read(224, " o02\0\01002 bD101\0\0\0".., 8192) = 8192
read(224, " o02\0\0188E `D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 HB0 `D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 HD2 `D101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 H U\vD201\0\0\0".., 8192) = 8192
read(224, " o02\0\0 x14 aD101\0\0\0".., 8192) = 8192
read(224, " o02\0\0C8B9 aD101\0\0\0".., 8192) = 8192
read(224, " o02\0\010BB aD101\0\0\0".., 8192) = 8192
read(224, " o02\0\0 h v aD101\0\0\0".., 8192) = 8192
write(219, " ,\0\0\0F4\0E8 *05\0 ( @".., 8192) = 8192

This is also the kind of activity I'd expect to see for an index build.
But ... it goes on forever. The current index build run has been going
for almost 100 hours.

More detail:
mainentance_work_mem = 1GB (out of 32GB RAM available)
9.0.1 was built from source, using Sun CC
for the current test, I'm running with fsync off
(but I've had the same issue with fsync on)

Ideas?

(* object names have been changed to preserve confidentiality)

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message curt 2011-01-25 20:47:53 BUG #5848: Between operator issue
Previous Message Robert Haas 2011-01-25 19:09:28 Re: BUG #5837: PQstatus() fails to report lost connection