Re: GiST index performance

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Matthew Wakeling <matthew(at)flymine(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: GiST index performance
Date: 2010-03-20 16:16:16
Message-ID: e4961fb71003200916r6c9913dag28c04aef290f877d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Mar 19, 2010 at 10:16 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:

> Hi Yeb,
>
> I have not looked at the gist code, but would it be possible to
> make virtual pages that have a size that is 1/power-of-2 * blocksize.
> Then the leaf node could be 1/8 or even 1/16 the size of the full
> pagesize.
>

Hello Ken,

The gist virtual pages would then match more the original blocksizes that
were used in Guttman's R-tree paper (first google result, then figure 4.5).
Since the nature/characteristics of the underlying datatypes and keys is not
changed, it might be that with the disk pages getting larger, gist indexing
has therefore become unexpectedly inefficient.

But I am also not really into the core-gist code, but do have a motivation
to dive into it (more than 200% performance increase in Mathew's test case).
However I'd like to verify for community support before working on it.

Maybe Theodor or Oleg could say something about how easy or hard it is to
do?

regards,
Yeb Havinga

>
> Regards,
> Ken
>
> On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote:
> > Yeb Havinga wrote:
> >>
> >> Since the gistpagesize is derived from the database blocksize, it might
> be
> >> wise to set the blocksize low for this case, I'm going to play with this
> a
> >> bit more.
> > Ok, one last mail before it turns into spam: with a 1KB database
> blocksize,
> > the query now runs in 30 seconds (original 70 on my machine, shared
> buffers
> > 240MB).
> > The per inner loop access time now 24 microsecs compared to on my machine
> > original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad
> > speedup with such a simple parameter :-)
> >
> > postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND
> b.b
> > + 2;
> > QUERY PLAN
> >
> >
> ---------------------------------------------------------------------------------------------------------------------------
> > Nested Loop (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual
> > time=0.184..29540.355 rows=2999997 loops=1)
> > -> Seq Scan on b (cost=0.00..47037.62 rows=999962 width=4) (actual
> > time=0.024..1783.484 rows=1000000 loops=1)
> > -> Index Scan using a_a on a (cost=0.00..2224.78 rows=111114 width=4)
> > (actual time=0.021..0.024 rows=3 loops=1000000)
> > Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
> > Total runtime: 30483.303 ms
> > (5 rows)
> >
> >
> > postgres=# select gist_stat('a_a');
> > gist_stat
> > -------------------------------------------
> > Number of levels: 5 +
> > Number of pages: 47618 +
> > Number of leaf pages: 45454 +
> > Number of tuples: 1047617 +
> > Number of invalid tuples: 0 +
> > Number of leaf tuples: 1000000 +
> > Total size of tuples: 21523756 bytes+
> > Total size of leaf tuples: 20545448 bytes+
> > Total size of index: 48760832 bytes+
> > (1 row)
> >
> >
> > --
> > Sent via pgsql-performance mailing list (
> pgsql-performance(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> >
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andy Colson 2010-03-21 03:47:30 Re: mysql to postgresql, performance questions
Previous Message Reydan Cankur 2010-03-20 04:50:38 pgbench installation