Re: combined indexes with Gist - planner issues?

From: Hans-Juergen Schoenig -- PostgreSQL <postgres(at)cybertec(dot)at>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zoltan Boszormenyi <zb(at)cybertec(dot)at>
Subject: Re: combined indexes with Gist - planner issues?
Date: 2009-08-31 14:40:21
Message-ID: 4A9BE0D5.7060106@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Mon, Aug 31, 2009 at 04:06:22PM +0200, Hans-Juergen Schoenig -- PostgreSQL wrote:
>
>> ok, i thought it would be something gist specific i was not aware of.
>> the golden question now is: i am looking for the cheapest products given
>> a certain text in an insane amount of data.
>> how to do it? other quals which could narrow down the amount of data
>> would not help.
>>
>> i cannot see an option with regular "weapons" ...
>> maybe you can an idea how to fix core to make it work? maybe there is a
>> mechanism we could need.
>> we really have to make this work - no matter what it takes.
>> we are willing to put effort into that.
>>
>
> The way I usually attack such a problem is to think of a data
> structure+algorithm that could produce the output you want. Once you've
> got that it's usually clear how you can make postgres do it and what
> changes would need to be made.
>
> At first glance I don't see any nice data structure specific for your
> problem. But it occurs to me that maybe you could just have a (btree)
> index on the price and just scan in asceding order until you have
> enough records. Expensive if the first record is expensive.
>
> Another possibility is to change your query to use the price in the
> GiST index: execute multiple queries of the form:
>
> ... AND display_price >= 0.01 and display_price < 1;
> ... AND display_price >= 1 and display_price < 10;
>
>

hello ...

i had a similar idea here but the problem is: prices will pretty much
depends on products.
to get to some critical example: "book" is a horribly frequent word and
you will find just too many in a too narrow price range.
using a price index is alone is not a good idea. how many products which
cost USD 9.95 do you know and how many of them are books? :(
i did some experiments which PL/proxy to scale out a little and i wrote
some C code to explicitly cache data from the start and so on.
this is all shit, however - it is too much data and I have too many request.
i don't want to fallback to some java-based stuff such as solr. it would
totally ruin my credibility and the stand postgres has at this customer.
whatever it takes - a PG based solution has to be found and implemented.

my knowledge of how gist works internally is not too extensive. any
"kickstart" idea would be appreciated.

many thanks,

hans

--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-08-31 14:42:26 Re: Feature request : add REMAP_SCHEMA-like option to pg_restore
Previous Message Martijn van Oosterhout 2009-08-31 14:24:41 Re: combined indexes with Gist - planner issues?