Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 21:06:37
Message-ID: 5827.1205269597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> I initially thought that using a single palloc'd array to hold all the
> XIDs would introduce a new limit on the number committed
> subtransactions, thanks to MaxAllocSize, but that's not the case.
> Without patch, we actually allocate an array like that anyway in
> xactGetCommittedChildren.

Right.

> Elsewhere in our codebase where we use arrays that are enlarged as
> needed, we keep track of the "allocated" size and the "used" size of the
> array separately, and only call repalloc when the array fills up, and
> repalloc a larger than necessary array when it does. I chose to just
> call repalloc every time instead, as repalloc is smart enough to fall
> out quickly if the chunk the allocation was made in is already larger
> than the new size. There might be some gain avoiding the repeated
> repalloc calls, but I doubt it's worth the code complexity, and calling
> repalloc with a larger than necessary size can actually force it to
> unnecessarily allocate a new, larger chunk instead of reusing the old
> one. Thoughts on that?

Seems like a pretty bad idea to me, as the behavior you're counting on
only applies to chunks up to 8K or thereabouts. In a situation where
you are subcommitting lots of XIDs one at a time, this is likely to have
quite awful behavior (or at least, you're at the mercy of the local
malloc library as to how bad it is). I'd go with the same
double-it-each-time-needed approach we use elsewhere.

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2008-03-11 21:25:13 Re: SPI-header-files safe for C++-compiler
Previous Message Bruce Momjian 2008-03-11 21:06:27 Re: Load Distributed Checkpoints, final patch

Browse pgsql-performance by date

  From Date Subject
Next Message sathiya psql 2008-03-12 04:07:52 Re: list user created triggers
Previous Message Craig Ringer 2008-03-11 16:04:35 Re: how many index can have????