Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 11:29:01
Message-ID: 47D51B7D.4010800@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

Thanks for the extremely helpful response. I don't think I would've
spotted that one in a hurry.

> You must be having an exception handler block in that pl/pgsql
> function, which implicitly creates a new subtransaction on each
> invocation of the exception handler block, so you end up with hundreds
> of thousands of committed subtransactions.

Aah - yes, there is. I didn't realize it'd have such an impact. I can
work around the need for it by explicitly checking the table constraints
in the function - in which case an uncaught exception will terminate the
transaction, but should only arise when I've missed a constraint check.

> For 8.4, it would be nice to improve that. I tested that on my laptop
> with a similarly-sized table, inserting each row in a pl/pgsql
> function with an exception handler, and I got very similar run times.
> According to oprofile, all the time is spent in
> TransactionIdIsInProgress. I think it would be pretty straightforward
> to store the committed subtransaction ids in a sorted array, instead
> of a linked list, and binary search. Or to use a hash table. That
> should eliminate this problem, though there is still other places as
> well where a large number of subtransactions will hurt performance.

That does sound interesting - and it would be nice to be able to use
exception handlers this way without too huge a performance hit. In the
end though it's something that can be designed around once you're aware
of it - and I'm sure that other ways of storing that data have their own
different costs and downsides.

What might also be nice, and simpler, would be a `notice', `log', or
even `debug1' level warning telling the user they've reached an absurd
number of subtransactions that'll cripple PostgreSQL's performance - say
100,000. There's precedent for this in the checkpoint frequency warning
8.3 produces if checkpoints are becoming too frequent - and like that
warning it could be configurable for big sites. If you think that's sane
I might have a go at it - though I mostly work in C++ so the result
probably won't be too pretty initially.

--
Craig Ringer

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Craig Ringer 2008-03-10 12:16:27 Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous Message Heikki Linnakangas 2008-03-10 11:01:32 Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2008-03-10 12:16:27 Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous Message Heikki Linnakangas 2008-03-10 11:01:32 Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit