Skip site navigation (1)
Skip section navigation (2)
## Re: Performance on inserts

### In response to

### Responses

### pgsql-hackers by date

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [000825 08:25] wrote: > > In other words, we do a linear scan over all the pages containing equal > key values, in the hope of finding one where we can shoehorn in the new > entry. If the keys are roughly equal-sized, that hope is usually vain, > and we end up splitting the rightmost page that contains any keys > equal to the target key. Subsequent inserts of the same key value fill > first the left-half split page, then the right-half, then split the > right-half page again and repeat --- but for each insert we first had > to scan over all the preceding pages containing equal keys. That means > inserting N equal keys takes O(N^2) time, for large enough N. > > After thinking about this for a while, I propose a probabilistic > solution. Suppose that we add to the above loop a random-number check > that causes us to fall out of the loop with probability 1/3 or so; > that is, if the current page is full of the target key, then we move > right to check the next page with probability 2/3, but with probability > 1/3 we give up and split the current page to insert the new key. > > With this approach we'd hardly ever move more than, say, 10 pages before > splitting, so the time to insert a new key is bounded no matter how many > duplicates it has. > > The reason for using a random decision is that if we always stopped > after scanning a fixed number of pages, then the space freed up by the > previous decision to split would always be just out of reach, and we'd > end up with lots of half-full pages of the same key. With a random > decision we will visit and fill both the left and right halves of a > previously split page. > > Comments? Is there a better way? What's the best probability to use? I'm unsure if it's possible, but somehow storing the last place one 'gave up' and decided to split the page could offer a useful next-start for the next insert. Sort of attempting the split work amongst multiple requests. For some reason it looks like your algorithm might cause problems because it plain gives up after 10 pages? hope this helps, -Alfred

- Re: Performance on inserts at 2000-08-25 15:13:08 from Tom Lane

- Re: Performance on inserts at 2000-08-25 18:25:37 from Tom Lane

Next: From:Brook MilliganDate:2000-08-25 17:30:53Subject: Re: advice on extensions neededPrevious: From: Tom LaneDate: 2000-08-25 15:25:45Subject: Re: advice on extensions needed