Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:14:04
Message-ID: 47D80F5C.2060500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

Pavan Deolasee wrote:
> On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> I didn't like it; it seemed overly complicated (consider dealing with
>> XID wraparound),
>
> We are talking about subtransactions here. I don't think we support
> subtransaction wrap-around, do we ?

Imagine that you start a transaction just before transaction
wrap-around, so that the top level XID is 2^31-10. Then you start 20
subtransactions. What XIDs will they get? Now how would you map those to
a bitmap?

It's certainly possible, you could index the bitmap by the index from
top transaction XID for example. But it does get a bit complicated.

>> and it would have problems with a slow transaction
>> generating a sparse set of subtransaction XIDs.
>
> I agree thats the worst case. But is that common ? Thats what I
> was thinking when I proposed the alternate solution. I thought that can
> happen only if most of the subtransactions abort, which again I thought
> is not a normal case. But frankly I am not sure if my assumption is correct.

It's not that common to have hundreds of thousands of subtransactions to
begin with..

>> I think getting rid of
>> the linear search will be enough to fix the performance problem.
>
> I wonder if a skewed binary search would help more ? For example,
> if we know that the range of xids stored in the array is 1 to 1000 and
> if we are searching for a number closer to 1000, we can break the
> array into <large,small> parts instead of equal parts and then
> search.

Possibly, but I doubt it's worth the trouble. The simple binary search
solved the performance problem well enough. In the test case of the OP,
with 300000 subtransactions, with the patch, there was no longer any
measurable difference whether you ran the "SELECT COUNT(*)" in the same
transaction as the INSERTs or after a COMMIT.

> Well, may be I making simple things complicated ;-)

I think so :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-03-12 17:22:13 Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous Message Pavan Deolasee 2008-03-12 17:02:37 Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-03-12 17:22:13 Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous Message Pavan Deolasee 2008-03-12 17:02:37 Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit