Skip site navigation (1) Skip section navigation (2)

Re: rtree/gist index taking enormous amount of space in 8.2.3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dolafi, Tom" <dolafit(at)janelia(dot)hhmi(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: rtree/gist index taking enormous amount of space in 8.2.3
Date: 2007-06-29 17:57:44
Message-ID: 6410.1183139864@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
"Dolafi, Tom" <dolafit(at)janelia(dot)hhmi(dot)org> writes:
> min(fmin) |   max(fmin)    |    avg(fmin)      
>    1      |   55296469     |    11423945     
> min(fmax) |   max(fmax)    |    avg(fmax)
>   18      |   55553288     |    11424491

OK, I was able to reproduce a problem after making the further guess
that fmax is usually a little bit greater than fmin.  The attached test
script generates an rtree index of around 800 pages on 8.1.9, and the
index build time is about 6 seconds on my machine.  On CVS HEAD, the
script generates a gist index of over 30000 pages and the build time is
over 60 seconds.  Since I'm using random() the numbers move around a
bit, but they're consistently awful.  I experimented with a few other
distributions, such as fmin and fmax chosen independently in the same
range, and saw gist build time usually better than rtree and index size
only somewhat larger, so this particular distribution apparently fools
gist_box_picksplit rather badly.  The problem seems nonlinear too ---
I had originally tried it with 1 million test rows instead of 100000,
and gave up waiting for the index build after more than an hour.

Oleg, Teodor, can this be improved?

			regards, tom lane

drop table featureloc;

CREATE TABLE featureloc
(
  fmin integer,
  fmax integer
);

insert into featureloc
  select r1, r1 + 1 + random() * 1000 from
  (select 1 + random() * 55000000 as r1, 1 + random() * 55000000 as r2
   from generate_series(1,100000) offset 0) as ss;

CREATE OR REPLACE FUNCTION boxrange(integer, integer)
  RETURNS box AS
    'SELECT box (point(0, $1), point($2, 500000000))'
  LANGUAGE 'sql' STRICT IMMUTABLE;

CREATE INDEX binloc_boxrange
  ON featureloc
  USING rtree
  (boxrange(fmin, fmax));

vacuum verbose featureloc;

In response to

Responses

pgsql-performance by date

Next:From: Dolafi, TomDate: 2007-06-29 18:05:30
Subject: Re: rtree/gist index taking enormous amount of space in 8.2.3
Previous:From: Craig JamesDate: 2007-06-29 16:14:08
Subject: Re: rtree/gist index taking enormous amount of space in 8.2.3

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group