GIN fast insert database hang

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: GIN fast insert database hang
Date: 2009-02-12 03:03:43
Message-ID: 603c8f070902111903g5cee8c83h966ca956b2a6d1b5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While fooling around with the GIN fast insert patch tonight, I managed
to hang my test database. :-(

I'm going to try to reproduce this, but here's approximately what I did.

create table foo (id serial, x int[], primary key (id));
create index foo_gin on foo using gin (x);
insert into foo (x) select array[(random() * 100)::int, (random() *
90)::int, (random()*80)::int] from generate_series(1,1000000);

The last insert command was run multiple times with various numbers in
place of the constant 1000000 through several autovacuum cycles.
Eventually, though, it froze up.

Here's pg_stat_activity, with apologies for the crappy formatting:

datid | datname | procpid | usesysid | usename |
current_query
| waiting | xact_start | query_
start | backend_start | client_addr | client_port
-------+---------+---------+----------+---------+-------------------------------
--------------------------------------------------------------------------------
---------------------+---------+-------------------------------+----------------
---------------+-------------------------------+-------------+-------------
16384 | rhaas | 15071 | 10 | rhaas | insert into foo (x) select arr
ay[(random() * 100)::int, (random() * 90)::int, (random()*80)::int] from generat
e_series(1,1000000); | f | 2009-02-11 21:42:24.984759-05 | 2009-02-11 21:4
2:24.984759-05 | 2009-02-11 21:31:45.061903-05 | | -1
16384 | rhaas | 15530 | 10 | rhaas | autovacuum: VACUUM public.foo

| f | 2009-02-11 21:42:10.046085-05 | 2009-02-11 21:4
2:10.046085-05 | 2009-02-11 21:42:09.947378-05 | |
16384 | rhaas | 15780 | 10 | rhaas | explain analyze select sum(1)
from foo where array[1] <@ x;
| f | 2009-02-11 21:43:20.518837-05 | 2009-02-11 21:4
3:20.518837-05 | 2009-02-11 21:43:18.717031-05 | | -1
16384 | rhaas | 15994 | 10 | rhaas | <IDLE>

| f | | 2009-02-11 21:4

Neither the INSERT nor the EXPLAIN ANALYZE nor the autovacuum died
when sent SIGINT or SIGTERM. They all seem to be hung on semop:

$ strace -p 15071
Process 15071 attached - interrupt to quit
semop(3866630, 0x7fff14a46be0, 1^C <unfinished ...>
Process 15071 detached
$ strace -p 15530
Process 15530 attached - interrupt to quit
semop(3866630, 0x7fff14a45740, 1^C <unfinished ...>
Process 15530 detached
$ strace -p 15780;
Process 15780 attached - interrupt to quit
semop(3866630, 0x7fff14a48c50, 1^C <unfinished ...>
Process 15780 detached
$ uname -a
Linux njpen7.snipinc.net 2.6.27.12-78.2.8.fc9.x86_64 #1 SMP Mon Jan 19
19:25:03 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/fedora-release
Fedora release 9 (Sulphur)

Backtrace of pid 15071:

#0 0x0000003b1bce6257 in semop () from /lib64/libc.so.6
#1 0x00000000005a8117 in PGSemaphoreLock ()
#2 0x00000000005c6409 in LockBufferForCleanup ()
#3 0x00000000004944d1 in ginInsertCleanup ()
#4 0x000000000049501d in ginHeapTupleFastInsert ()
#5 0x000000000048a772 in gininsert ()
#6 0x000000000068fb39 in FunctionCall6 ()
#7 0x0000000000469181 in index_insert ()
#8 0x0000000000540ee2 in ExecInsertIndexTuples ()
#9 0x0000000000536504 in standard_ExecutorRun ()
#10 0x00000000005e07e1 in ProcessQuery ()
#11 0x00000000005e0a06 in PortalRunMulti ()
#12 0x00000000005e11b2 in PortalRun ()
#13 0x00000000005dc507 in exec_simple_query ()
#14 0x00000000005ddad7 in PostgresMain ()
#15 0x00000000005b2e58 in ServerLoop ()
#16 0x00000000005b3b39 in PostmasterMain ()
#17 0x000000000055e1e8 in main ()

Backtrace of pid 15530:
#0 0x0000003b1bce6257 in semop () from /lib64/libc.so.6
#1 0x00000000005a8117 in PGSemaphoreLock ()
#2 0x00000000005d624d in LWLockAcquire ()
#3 0x0000000000494322 in ginInsertCleanup ()
#4 0x00000000004922ed in ginvacuumcleanup ()
#5 0x000000000068fddc in FunctionCall2 ()
#6 0x0000000000530e1e in lazy_scan_heap ()
#7 0x00000000005312d0 in lazy_vacuum_rel ()
#8 0x000000000052f935 in vacuum_rel ()
#9 0x000000000052fd4c in vacuum ()
#10 0x00000000005aa338 in do_autovacuum ()
#11 0x00000000005aa62d in AutoVacWorkerMain ()
#12 0x00000000005aa756 in StartAutoVacWorker ()
#13 0x00000000005b525e in sigusr1_handler ()
#14 <signal handler called>
#15 0x0000003b1bcdcbb3 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000005b281c in ServerLoop ()
#17 0x00000000005b3b39 in PostmasterMain ()
#18 0x000000000055e1e8 in main ()

Backtrace of pid 15780:
#0 0x0000003b1bce6257 in semop () from /lib64/libc.so.6
#1 0x00000000005a8117 in PGSemaphoreLock ()
#2 0x00000000005d624d in LWLockAcquire ()
#3 0x00000000004913ac in scanFastInsert ()
#4 0x000000000049201c in gingetbitmap ()
#5 0x000000000068fddc in FunctionCall2 ()
#6 0x000000000046878e in index_getbitmap ()
#7 0x00000000005459c5 in MultiExecBitmapIndexScan ()
#8 0x00000000005455f4 in BitmapHeapNext ()
#9 0x000000000053f492 in ExecScan ()
#10 0x0000000000538360 in ExecProcNode ()
#11 0x000000000054435a in ExecAgg ()
#12 0x00000000005382a0 in ExecProcNode ()
#13 0x0000000000535ed2 in standard_ExecutorRun ()
#14 0x00000000004f9b2f in ExplainOnePlan ()
#15 0x00000000004f9d4e in ExplainQuery ()
#16 0x00000000005df8f7 in PortalRunUtility ()
#17 0x00000000005e0c3c in FillPortalStore ()
#18 0x00000000005e123d in PortalRun ()
#19 0x00000000005dc507 in exec_simple_query ()
#20 0x00000000005ddad7 in PostgresMain ()
#21 0x00000000005b2e58 in ServerLoop ()
#22 0x00000000005b3b39 in PostmasterMain ()
#23 0x000000000055e1e8 in main ()

Since fast shutdown failed, I tried an immediate shutdown, which did
work, but the database won't start back up:

LOG: database system was interrupted; last known up at 2009-02-11 21:42:38 EST
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/3600E050
FATAL: bad buffer id: 0
CONTEXT: xlog redo Delete list page (16), node: 19425/19426/19427 blkno: 19429
LOG: startup process (PID 17314) exited with exit code 1
LOG: aborting startup due to startup process failure

Ugh...

...Robert

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-02-12 03:07:00 Re: A deprecation policy
Previous Message Robert Haas 2009-02-12 02:01:03 Re: Fixing Grittner's planner issues