Skip site navigation (1) Skip section navigation (2)

Re: Performance on inserts

From: Jules Bean <jules(at)jellybean(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alfred Perlstein <bright(at)wintelcom(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance on inserts
Date: 2000-08-26 16:57:38
Message-ID: 20000826175736.A3611@pear.presence.net.uk (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sat, Aug 26, 2000 at 12:32:20PM -0400, Tom Lane wrote:
> Jules Bean <jules(at)jellybean(dot)co(dot)uk> writes:
> > Is there any chance you could generate a patch against released 7.0.2
> > to add just this functionality... It would be the kiss of life for my
> > code!
> 
> Will look at it.  Are you brave enough to want to try the rest of the
> 7.1 rewrite of the btree code, or do you just want this one hack?

Hmm.  I don't know :-) because I don't really know the scope of the
changes... I do need this database to be stable, but OTOH, changes to
the index code are unlikely to corrupt my data.

> 
> I don't think anyone was envisioning "automagic" drop of most common
> values.  The partial-index support that's partially there ;-) is
> designed around manual specification of a predicate, ie, you'd say
> 
> 	CREATE INDEX mypartialindex ON table (column)
> 		WHERE column != 42 AND column != 1066

Fair enough.

> 
> > For the short term, if I can get a working version of the above
> > randomisation patch, I think I shall 'fake' a partial index by
> > manually setting 'enable_seqscan=off' for all but the 4 or 5 most
> > common categories. Those two factors combined will speed up my bulk
> > inserts a lot.
> 
> Uh, enable_seqscan has nothing to do with how inserts are handled...

Um, that was a thinko!  What I was trying to say is: the randomisation
will speed up my bulk inserts, whereas enable_seqscan=off will speed
up the other slow queries in my application, namely "SELECT * where
category='foo'" type queries.

> > Specifically, their effect on the index files.
> 
> This particular problem couldn't be cured by batching inserts anyway.
> The performance problem was coming from the actual act of inserting
> a key (or more specifically, making room for the key) and that's just
> got to be done for each key AFAICS.

In principle, it could be helpful to know you're inserting 10000
tuples with category='xyz'.  Then you only make the left-to-right
scan once, and you fill up every hole you see, before finally
splitting the last page and inserting approx (10000)/(num_per_page)
new pages all at once.  This is surely quicker that doing the
left-to-right scan for each tuple (even though the difference would be
far less noticeable in the presence of your probablistic patch).

Jules

-- 
Jules Bean                          |        Any sufficiently advanced 
jules(at)debian(dot)org                    |  technology is indistinguishable
jules(at)jellybean(dot)co(dot)uk               |               from a perl script

In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2000-08-26 18:51:10
Subject: AC_FUNC_ACCEPT_ARGTYPES is falling down on the job
Previous:From: Thomas LockhartDate: 2000-08-26 16:33:02
Subject: Re: Proposal for supporting outer joins in 7.1

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group