Re: rtree/gist index taking enormous amount of space in 8.2.3

From: "Dolafi, Tom" <dolafit(at)janelia(dot)hhmi(dot)org>
To: "Craig James" <craig_james(at)emolecules(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: rtree/gist index taking enormous amount of space in 8.2.3
Date: 2007-06-29 18:05:30
Message-ID: AE9860225100F14D87B26D0D4D6766DB46F42B@EXCHANGE03.janelia.priv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The data is not distributed well...

Top 20 occurrences of fmin and fmax:
fmin | count
----------+--------
0 | 214476
19281576 | 2870
2490005 | 2290
1266332 | 2261
15539680 | 2086
11022233 | 2022
25559658 | 1923
3054411 | 1906
10237885 | 1890
13827272 | 1876
19187021 | 1847
18101335 | 1845
1518230 | 1843
21199488 | 1842
1922518 | 1826
1216144 | 1798
25802126 | 1762
8307335 | 1745
21271866 | 1736
8361667 | 1721

fmax | count
----------+--------
25 | 197551
21272002 | 547
21271988 | 335
21271969 | 321
6045781 | 247
1339301 | 243
21669151 | 235
7779506 | 232
2571422 | 229
7715946 | 228
27421323 | 222
7048089 | 221
87364 | 219
13656535 | 217
26034147 | 214
19184612 | 213
7048451 | 213
21668877 | 213
6587492 | 212
9484598 | 212

Also, out of 5.7 million rows there are 1.6 million unique fmin and 1.6
million unique fmax values.

Thanks,
Tom

-----Original Message-----
From: Craig James [mailto:craig_james(at)emolecules(dot)com]
Sent: Friday, June 29, 2007 12:14 PM
To: Dolafi, Tom
Cc: Tom Lane; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space
in 8.2.3

Dolafi, Tom wrote:
> min(fmin) | max(fmin) | avg(fmin)
> 1 | 55296469 | 11423945
>
> min(fmax) | max(fmax) | avg(fmax)
> 18 | 55553288 | 11424491
>
> There are 5,704,211 rows in the table.

When you're looking for weird index problems, it's more interesting to
know if there are certain numbers that occur a LOT. From your
statistics above, each number occurs about 10 times in the table. But
do some particular numbers occur thousands, or even millions, of times?

Here is a query that will print a list of the highest-occuring values.
You might expect a few occurances of 20, and maybe 30, but if you have
thousands or millions of occurances of certain numbers, then that can
screw up an index.

select fmax, c from
(select fmax, count(fmax) as c from your_table group by fmax) as foo
where c > 3 order by c desc;

Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dolafi, Tom 2007-06-29 18:13:23 Re: rtree/gist index taking enormous amount of space in 8.2.3
Previous Message Tom Lane 2007-06-29 17:57:44 Re: rtree/gist index taking enormous amount of space in 8.2.3