Skip site navigation (1) Skip section navigation (2)

Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 11:01:32
Message-ID: 47D5150C.8090802@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-patchespgsql-performance
Craig Ringer wrote:
> I'm encountering an odd issue with a bulk import query using PostgreSQL
> 8.3. After a 400,000 row import into a just-truncated table `booking', a
> sequential scan run on the table in the same transaction is incredibly
> slow, taking ~ 166738.047 ms. After a:
> 	`COMMIT; BEGIN;'
> the same query runs in 712.615 ms, with almost all the time difference
> being in the sequential scan of the `booking' table [schema at end of post].
> 
> The table is populated by a complex pl/pgsql function that draws from
> several other tables to convert data from another app's format. 

You must be having an exception handler block in that pl/pgsql function, 
which implicitly creates a new subtransaction on each invocation of the 
exception handler block, so you end up with hundreds of thousands of 
committed subtransactions. For each row in the seq scan, the list of 
subtransactions is scanned, to see if the transaction that inserted the 
row is part of the current top-level transaction. That's fine for a 
handful of subtransactions, but it gets really slow with large numbers 
of them, as you've seen. It's an O(n^2) operation, where n is the number 
of rows inserted, so you'll be in even more trouble if the number of 
rows increases.

As a work-around, avoid using exception handlers, or process more than 1 
row per function invocation. Or COMMIT the transaction, as you did.

For 8.4, it would be nice to improve that. I tested that on my laptop 
with a similarly-sized table, inserting each row in a pl/pgsql function 
with an exception handler, and I got very similar run times. According 
to oprofile, all the time is spent in TransactionIdIsInProgress. I think 
it would be pretty straightforward to store the committed subtransaction 
ids in a sorted array, instead of a linked list, and binary search. Or 
to use a hash table. That should eliminate this problem, though there is 
still other places as well where a large number of subtransactions will 
hurt performance.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-performance by date

Next:From: Craig RingerDate: 2008-03-10 11:29:01
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous:From: Craig RingerDate: 2008-03-10 09:55:59
Subject: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

pgsql-patches by date

Next:From: Craig RingerDate: 2008-03-10 11:29:01
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous:From: Simon RiggsDate: 2008-03-10 10:43:09
Subject: Re: Include Lists for Text Search

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group