Re: Performance on inserts

From: Alfred Perlstein <bright(at)wintelcom(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jules Bean <jules(at)jellybean(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance on inserts
Date: 2000-08-25 15:52:19
Message-ID: 20000825085219.V1209@fw.wintelcom.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [000825 08:25] wrote:
>
> In other words, we do a linear scan over all the pages containing equal
> key values, in the hope of finding one where we can shoehorn in the new
> entry. If the keys are roughly equal-sized, that hope is usually vain,
> and we end up splitting the rightmost page that contains any keys
> equal to the target key. Subsequent inserts of the same key value fill
> first the left-half split page, then the right-half, then split the
> right-half page again and repeat --- but for each insert we first had
> to scan over all the preceding pages containing equal keys. That means
> inserting N equal keys takes O(N^2) time, for large enough N.
>
> After thinking about this for a while, I propose a probabilistic
> solution. Suppose that we add to the above loop a random-number check
> that causes us to fall out of the loop with probability 1/3 or so;
> that is, if the current page is full of the target key, then we move
> right to check the next page with probability 2/3, but with probability
> 1/3 we give up and split the current page to insert the new key.
>
> With this approach we'd hardly ever move more than, say, 10 pages before
> splitting, so the time to insert a new key is bounded no matter how many
> duplicates it has.
>
> The reason for using a random decision is that if we always stopped
> after scanning a fixed number of pages, then the space freed up by the
> previous decision to split would always be just out of reach, and we'd
> end up with lots of half-full pages of the same key. With a random
> decision we will visit and fill both the left and right halves of a
> previously split page.
>
> Comments? Is there a better way? What's the best probability to use?

I'm unsure if it's possible, but somehow storing the last place one
'gave up' and decided to split the page could offer a useful next-start
for the next insert. Sort of attempting the split work amongst multiple
requests. For some reason it looks like your algorithm might cause
problems because it plain gives up after 10 pages?

hope this helps,
-Alfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brook Milligan 2000-08-25 17:30:53 Re: advice on extensions needed
Previous Message Tom Lane 2000-08-25 15:25:45 Re: advice on extensions needed