Skip site navigation (1) Skip section navigation (2)

Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>,"pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 12:34:07
Message-ID: 47D67C3F.1030406@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-patchespgsql-performance
(moved to pgsql-patches, as there's a patch attached)

Tom Lane wrote:
> Getting rid of the linked-list representation would be a win in a couple
> of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
> and xactGetCommittedChildren would go away.  OTOH AtSubCommit_childXids
> would get more expensive.

I couldn't let this case go, so I wrote a patch. I replaced the linked 
list with an array that's enlarged at AtSubCommit_childXids when necessary.

I couldn't measure any performance hit from the extra work of enlarging 
and memcpying the array. I tried two test cases:

1. Insert one row per subtransaction, and commit the subtransaction 
after each insert. This is like the OP's case.

         printf("CREATE TABLE foo (id int4);\n");
         printf("BEGIN;\n");
         for(i=1; i <= 100000; i++)
         {
                 printf("SAVEPOINT sp%d;\n", i);
                 printf("INSERT INTO foo VALUES (1);\n");
                 printf("RELEASE SAVEPOINT sp%d;\n", i);
         }
         printf("COMMIT;\n");

2. Insert one row per subtransaction, but leave the subtransaction in 
not-committed state

         printf("CREATE TABLE foo (id int4, t text);\n");
         printf("BEGIN;\n");
         for(i=1; i <= 100000; i++)
         {
                 printf("SAVEPOINT sp%d;\n", i);
                 printf("INSERT INTO foo VALUES (1, 'f');\n");
         }
         printf("COMMIT;\n");

Test case 1 is not bad, because we just keep appending to the parent 
array one at a time. Test case 2 might become slower, as the number of 
subtransactions increases, as at the commit of each subtransaction you 
need to enlarge the parent array and copy all the already-committed 
childxids to it. However, you hit the limit on amount of shared mem 
required for holding the per-xid locks before that becomes a problem, 
and the performance becomes dominated by other things anyway (notably 
LockReassignCurrentOwner).

I initially thought that using a single palloc'd array to hold all the 
XIDs would introduce a new limit on the number committed 
subtransactions, thanks to MaxAllocSize, but that's not the case. 
Without patch, we actually allocate an array like that anyway in 
xactGetCommittedChildren.

Elsewhere in our codebase where we use arrays that are enlarged as 
needed, we keep track of the "allocated" size and the "used" size of the 
array separately, and only call repalloc when the array fills up, and 
repalloc a larger than necessary array when it does. I chose to just 
call repalloc every time instead, as repalloc is smart enough to fall 
out quickly if the chunk the allocation was made in is already larger 
than the new size. There might be some gain avoiding the repeated 
repalloc calls, but I doubt it's worth the code complexity, and calling 
repalloc with a larger than necessary size can actually force it to 
unnecessarily allocate a new, larger chunk instead of reusing the old 
one. Thoughts on that?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: childXids-binsearch-4.patch
Description: text/x-diff (11.1 KB)

In response to

Responses

pgsql-performance by date

Next:From: Pavan DeolaseeDate: 2008-03-11 13:06:33
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Previous:From: petchimuthu lingamDate: 2008-03-11 12:28:30
Subject: how many index can have????

pgsql-patches by date

Next:From: Heikki LinnakangasDate: 2008-03-11 12:57:23
Subject: Re: TransactionIdIsInProgress() cache
Previous:From: Zoltan BoszormenyiDate: 2008-03-11 10:39:06
Subject: Fix HAVE_LONG[_LONG]_INT_64 to really define to 1

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group