Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 12:34:07
Message-ID: 47D67C3F.1030406@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

(moved to pgsql-patches, as there's a patch attached)

Tom Lane wrote:
> Getting rid of the linked-list representation would be a win in a couple
> of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
> and xactGetCommittedChildren would go away. OTOH AtSubCommit_childXids
> would get more expensive.

I couldn't let this case go, so I wrote a patch. I replaced the linked
list with an array that's enlarged at AtSubCommit_childXids when necessary.

I couldn't measure any performance hit from the extra work of enlarging
and memcpying the array. I tried two test cases:

1. Insert one row per subtransaction, and commit the subtransaction
after each insert. This is like the OP's case.

printf("CREATE TABLE foo (id int4);\n");
printf("BEGIN;\n");
for(i=1; i <= 100000; i++)
{
printf("SAVEPOINT sp%d;\n", i);
printf("INSERT INTO foo VALUES (1);\n");
printf("RELEASE SAVEPOINT sp%d;\n", i);
}
printf("COMMIT;\n");

2. Insert one row per subtransaction, but leave the subtransaction in
not-committed state

printf("CREATE TABLE foo (id int4, t text);\n");
printf("BEGIN;\n");
for(i=1; i <= 100000; i++)
{
printf("SAVEPOINT sp%d;\n", i);
printf("INSERT INTO foo VALUES (1, 'f');\n");
}
printf("COMMIT;\n");

Test case 1 is not bad, because we just keep appending to the parent
array one at a time. Test case 2 might become slower, as the number of
subtransactions increases, as at the commit of each subtransaction you
need to enlarge the parent array and copy all the already-committed
childxids to it. However, you hit the limit on amount of shared mem
required for holding the per-xid locks before that becomes a problem,
and the performance becomes dominated by other things anyway (notably
LockReassignCurrentOwner).

I initially thought that using a single palloc'd array to hold all the
XIDs would introduce a new limit on the number committed
subtransactions, thanks to MaxAllocSize, but that's not the case.
Without patch, we actually allocate an array like that anyway in
xactGetCommittedChildren.

Elsewhere in our codebase where we use arrays that are enlarged as
needed, we keep track of the "allocated" size and the "used" size of the
array separately, and only call repalloc when the array fills up, and
repalloc a larger than necessary array when it does. I chose to just
call repalloc every time instead, as repalloc is smart enough to fall
out quickly if the chunk the allocation was made in is already larger
than the new size. There might be some gain avoiding the repeated
repalloc calls, but I doubt it's worth the code complexity, and calling
repalloc with a larger than necessary size can actually force it to
unnecessarily allocate a new, larger chunk instead of reusing the old
one. Thoughts on that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
childXids-binsearch-4.patch text/x-diff 11.1 KB

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2008-03-11 12:57:23 Re: TransactionIdIsInProgress() cache
Previous Message Zoltan Boszormenyi 2008-03-11 10:39:06 Fix HAVE_LONG[_LONG]_INT_64 to really define to 1

Browse pgsql-performance by date

  From Date Subject
Next Message Pavan Deolasee 2008-03-11 13:06:33 Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous Message petchimuthu lingam 2008-03-11 12:28:30 how many index can have????