Re: Large object loading stalls

From: Michael Akinde <michael(dot)akinde(at)met(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Large object loading stalls
Date: 2009-02-20 11:24:49
Message-ID: 499E9301.2030902@met.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Hmm, can you attach to the stuck backend and the vacuum worker process
> with gdb and get stack traces from them? The pg_locks view does not
> indicate any locking problem, but I'm wondering if there could be a
> deadlock at the LWLock level.
My reply seems to have been lost in the ether. Anyway, I fixed the low
fsm settings and managed to replicate the problem in two separate
instances and the problem does not appear to be autovacuum, as I was
able to observe the process hanging long after autovacuum has been
released. Perhaps the vacuuming tasks were getting stuck before because
of the too low fsm setting?

Anyway - the situation now is that just the loading process is hanging
on the server, with an <IDLE> in transaction. But it is definitely the
loading program that is hanging, not the Postgres server.

pg_locks

2701646 | wdb | 26359 | 2701645 | wdb | <IDLE> in transaction
| f | | 2009-02-18
23:57:59.619868+00 | 2009-02-18 23:57:58.461848+00 | |
-1

Backtrace from postgress process

#0 0x00002ad9ed3fef15 in recv () from /lib/libc.so.6
#1 0x000000000053ba38 in secure_read ()
#2 0x0000000000542700 in pq_comm_reset ()
#3 0x0000000000542b47 in pq_getbyte ()
#4 0x00000000005b648d in prepare_for_client_read ()
#5 0x00000000005b6d7a in PostgresMain ()
#6 0x000000000058c34b in ClosePostmasterPorts ()
#7 0x000000000058d06e in PostmasterMain ()
#8 0x00000000005444f5 in main ()

Backtrace from gribLoad

#0 0x00002b2ab43c2c8f in poll () from /lib/libc.so.6
#1 0x00002b2ab47cc4af in PQmblen () from /usr/lib/libpq.so.4
#2 0x00002b2ab47cc590 in pqWaitTimed () from /usr/lib/libpq.so.4
#3 0x00002b2ab47cbe72 in PQgetResult () from /usr/lib/libpq.so.4
#4 0x00002b2ab47cbf4e in PQgetResult () from /usr/lib/libpq.so.4
#5 0x00002b2ab32a0556 in pqxx::connection_base::prepared_exec () from
/usr/lib/libpqxx-2.6.8.so
#6 0x00002b2ab32be6ed in pqxx::transaction_base::prepared_exec () from
/usr/lib/libpqxx-2.6.8.so
#7 0x00002b2ab32b2486 in pqxx::prepare::invocation::exec () from
/usr/lib/libpqxx-2.6.8.so
#8 0x00002b2ab2d9b4cc in wdb::database::WriteValue::operator() () from
/usr/lib/libwdbLoaderBase.so.0
#9 0x00002b2ab2da27d8 in
pqxx::connection_base::perform<wdb::database::WriteValue> ()
from /usr/lib/libwdbLoaderBase.so.0
#10 0x00002b2ab2d99ddb in
wdb::database::LoaderDatabaseConnection::loadField () from
/usr/lib/libwdbLoaderBase.so.0
#11 0x00000000004182f0 in log4cpp::CategoryStream::operator<< <char [13]> ()
#12 0x00000000004073e8 in ?? ()
#13 0x000000000040819f in ?? ()
#14 0x00002b2ab431e4ca in __libc_start_main () from /lib/libc.so.6
#15 0x000000000040665a in ?? ()
#16 0x00007ffff7e3d6c8 in ?? ()
#17 0x0000000000000000 in ?? ()

Whatever weirdness happens appears to always occur at this point in the
process (previous stacktraces we've done point to the same insert
statement), but the timing is seemingly totally random (it can occur
right away, or the loading can run dozens of times before getting
stuck). I am rather at a loss to explain this. We've loaded literally
millions of rows with this code, so the functionality is hardly
untested. And is it something we are doing, or
could we have hit upon some concurrency issue in pq or pqxx transactors?
Any hints or tips to help identify the problem would be appreciated.

Strangely, if one strace's into the loading process (not the postgres
process), then the poll() call on which the process can have been
hanging for hours will release and the process will just go on as if
nothing has happened. Anyone seen stuff like this happen before?

Regards,

Michael A.

Attachment Content-Type Size
michael.akinde.vcf text/x-vcard 287 bytes

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eus 2009-02-20 13:02:39 Why I cannot call a function from within an SQL function?
Previous Message Jasen Betts 2009-02-20 10:53:28 Re: Query with date where clause is very slow