Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: Rainer Pruy <Rainer(dot)Pruy(at)Acrys(dot)COM>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 16:55:43
Message-ID: 47D5680F.6080508@acrys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

We experienced a similar degradation,
when heavily using savepoints within a single transaction.
However, we had not yet enough time to really investigate the issue.
It also was not directly reproducible using a (small) set of statements from a script.
As the overall scenario "bulk loads with sub-transactions" is close to the scenario we do run, it might come down to the same reason, so.

Thus take my vote for a solution that does not end up with "don't use (sub-) transactions".

Regards,
Rainer

Craig Ringer schrieb:
> Thanks for the extremely helpful response. I don't think I would've
> spotted that one in a hurry.
>
>> You must be having an exception handler block in that pl/pgsql
>> function, which implicitly creates a new subtransaction on each
>> invocation of the exception handler block, so you end up with hundreds
>> of thousands of committed subtransactions.
>
> Aah - yes, there is. I didn't realize it'd have such an impact. I can
> work around the need for it by explicitly checking the table constraints
> in the function - in which case an uncaught exception will terminate the
> transaction, but should only arise when I've missed a constraint check.
>
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql
>> function with an exception handler, and I got very similar run times.
>> According to oprofile, all the time is spent in
>> TransactionIdIsInProgress. I think it would be pretty straightforward
>> to store the committed subtransaction ids in a sorted array, instead
>> of a linked list, and binary search. Or to use a hash table. That
>> should eliminate this problem, though there is still other places as
>> well where a large number of subtransactions will hurt performance.
>
> That does sound interesting - and it would be nice to be able to use
> exception handlers this way without too huge a performance hit. In the
> end though it's something that can be designed around once you're aware
> of it - and I'm sure that other ways of storing that data have their own
> different costs and downsides.
>
> What might also be nice, and simpler, would be a `notice', `log', or
> even `debug1' level warning telling the user they've reached an absurd
> number of subtransactions that'll cripple PostgreSQL's performance - say
> 100,000. There's precedent for this in the checkpoint frequency warning
> 8.3 produces if checkpoints are becoming too frequent - and like that
> warning it could be configurable for big sites. If you think that's sane
> I might have a go at it - though I mostly work in C++ so the result
> probably won't be too pretty initially.
>
> --
> Craig Ringer
>

--
Rainer Pruy
Geschäftsführer

Acrys Consult GmbH & Co. KG
Untermainkai 29-30, D-60329 Frankfurt
Tel: +49-69-244506-0 - Fax: +49-69-244506-50
Web: http://www.acrys.com - Email: office(at)acrys(dot)com
Handelsregister: Frankfurt am Main, HRA 31151

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2008-03-10 17:38:33 Re: CopyReadLineText optimization
Previous Message Heikki Linnakangas 2008-03-10 16:08:35 Re: CopyReadLineText optimization

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2008-03-10 17:14:23 Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0
Previous Message Dimitri Fontaine 2008-03-10 16:18:16 Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0