Re: B-Heaps

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Matthew Wakeling <matthew(at)flymine(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: B-Heaps
Date: 2010-06-18 18:30:17
Message-ID: 4C1BBB39.10308@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Smith wrote:
> Matthew Wakeling wrote:
>> This sort of thing has been fairly well researched at an academic
>> level, but has not been implemented in that many real world
>> situations. I would encourage its use in Postgres.
>
> I guess, but don't forget that work on PostgreSQL is driven by what
> problems people are actually running into. There's a long list of
> performance improvements sitting in the TODO list waiting for people
> to find time to work on them, ones that we're quite certain are
> useful. That anyone is going to chase after any of these speculative
> ideas from academic research instead of one of those is unlikely.
> Your characterization of the potential speed up here is "Using a
> proper tree inside the index page would improve the CPU usage of the
> index lookups", which seems quite reasonable. Regardless, when I
> consider "is that something I have any reason to suspect is a
> bottleneck on common workloads?", I don't think of any, and return to
> working on one of things I already know is instead.
>
There are two different things concerning gist indexes:

1) with larger block sizes and hence, larger # entries per gist page,
results in more generic keys of those pages. This in turn results in a
greater number of hits, when the index is queried, so a larger part of
the index is scanned. NB this has nothing to do with caching / cache
sizes; it holds for every IO model. Tests performed by me showed
performance improvements of over 200%. Since then implementing a speedup
has been on my 'want to do list'.

2) there are several approaches to get the # entries per page down. Two
have been suggested in the thread referred to by Matthew (virtual pages
(but how to order these?) and tree within a page). It is interesting to
see if ideas from Prokop's cache oblivous algorithms match with this
problem to find a suitable virtual page format.

regards,
Yeb Havinga

In response to

Responses

  • Re: B-Heaps at 2010-06-18 18:44:22 from Kevin Grittner

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2010-06-18 18:44:22 Re: B-Heaps
Previous Message Greg Smith 2010-06-18 18:11:53 Re: requested shared memory size overflows size_t