Skip site navigation (1) Skip section navigation (2)

Re: Performance on inserts

From: Jules Bean <jules(at)jellybean(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alfred Perlstein <bright(at)wintelcom(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance on inserts
Date: 2000-08-26 10:48:58
Message-ID: 20000826114855.A1524@pear.presence.net.uk (view raw or flat)
Thread:
Lists: pgsql-hackers
On Fri, Aug 25, 2000 at 07:00:22PM -0400, Tom Lane wrote:
> The algorithm does seem to work quite nicely just the way I described
> it, although it turns out I was off base about a good probability
> setting.  I find that something up around 0.99 seems to be good.
> Using the same (perhaps overly simplistic) test case:
> 
> # tuples inserted		6.5		current+random hack @ 0.99
> 			Time	index size	Time	index size
> 1536			<1sec	90112		<1sec	106496
> 3072			1.56	163840		<1sec	188416
> 6144			3.70	286720		1.40	376832
> 12288			9.73	532480		2.65	688128
> 24576			93.26	1024000		5.22	1368064
> 49152			363.23	2007040		10.34	2727936
> 98304						22.07	5545984
> 196608						45.60	11141120
> 393216						92.53	22290432
> 
> I tried probabilities from 0.67 to 0.999 and found that runtimes didn't
> vary a whole lot (though this is near the minimum), while index size
> consistently got larger as the probability of moving right decreased.
> The runtime is nicely linear throughout the range.

That looks brilliant!! (Bearing in mind that I have over 10 million
tuples in my table, you can imagine what performance was like for me!)
Is there any chance you could generate a patch against released 7.0.2
to add just this functionality... It would be the kiss of life for my
code!

(Not in a hurry, I'm not back in work until Wednesday, as it happens)

And, of course, what would /really/ get my code going speedily would
be the partial indices mentioned elsewhere in this thread.  If the
backend could automagically drop keys containing > 10% (tunable) of
the rows from the index, then my index would be (a) about 70% smaller!
and (b) only used when it's faster. [This means it would have to
update some simple histogram data.  However, I can't see that being
much of an overhead]

For the short term, if I can get a working version of the above
randomisation patch, I think I shall 'fake' a partial index by
manually setting 'enable_seqscan=off' for all but the 4 or 5 most
common categories. Those two factors combined will speed up my bulk
inserts a lot.

One other idea, though:

Is there any simple way for Pg to combine inserts into one bulk?
Specifically, their effect on the index files.  It has always seemed
to me to be one of the (many) glaring flaws in SQL that the INSERT
statement only takes one row at a time.  But, using INSERT ... SELECT,
I can imagine that it might be possible to do 'bulk' index
updating. so that scanning process is done once per 'batch'.

If I can make an analogy with sorted files (which indices are rather
like), if I wanted to add another 100 lines to a 1000 line sorted
file, I'd sort the 100 first, and then 'merge' them in.  Whilst I
realise that indices aren't stored sorted (no need), I think it ought
to be possible to construct an efficient algorithm for merging two
btrees?

Jules

-- 
Jules Bean                          |        Any sufficiently advanced 
jules(at)debian(dot)org                    |  technology is indistinguishable
jules(at)jellybean(dot)co(dot)uk               |               from a perl script

In response to

Responses

pgsql-hackers by date

Next:From: Matthew KirkwoodDate: 2000-08-26 11:14:06
Subject: Re: Performance on inserts
Previous:From: Mario WeilguniDate: 2000-08-26 09:08:10
Subject: TNS Services like Oracle?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group