Skip site navigation (1) Skip section navigation (2)

Re: Full text search with ORDER BY performance issue

From: Krade <krade(at)krade(dot)com>
To: valgog <valgog(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Full text search with ORDER BY performance issue
Date: 2009-07-21 17:10:29
Message-ID: 4A65F685.5080601@krade.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 7/21/2009 11:32, valgog wrote:
> Hi,
>
> There is a problem with GIN and GIST indexes, that they cannot be used
> by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it
> possible to use the b-tree columns in GIST or GIN to make the sort
> easier, but I have no idea how difficult it will be to implement it in
> current GIN or GIST structures. I think Oleg or even Tom will be the
> right people to ask it :) But even if it is possible it will not be
> implemented at least until 8.5 that will need a year to come, so until
> then...
>    
Unfortunately, it's not even just the lack of ORDER BY support, 
btree_gin indexes seem to be broken under some circumstances. So I can't 
even use my idea to limit searches to the last 10 days.

See this:
http://pgsql.privatepaste.com/5219TutUMk

The first query gives bogus results. It's not using the index correctly.

timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The 
timestamp column is an integer. The queries work right if I drop the 
index. Is this a bug in btree_gin?
> It is possible to strip your table in several smaller ones putting
> them on different machines and then splitting your query with DBLINK.
> This will distribute the burden of sorting to several machines that
> will have to sort smaller parts as well. After you have your 25 ids
> from each of the machines, you can merge them, sort again and limit as
> you wish. Doing large offsets will be still problematic but faster
> anyway in most reasonable offset ranges. (Load balancing tools like
> pg_pool can automate this task, but I do not have practical experience
> using them for that purposes)
>
> Yet another very interesting technology -- sphinx search (http://
> www.sphinxsearch.com/). It can distribute data on several machines
> automatically, but it will be probably too expensive to start using
> (if your task is not your main one :)) as they do not have standard
> automation scripts, it does not support live updates (so you will
> always have some minutes delay), and this is a standalone service,
> that needs to be maintained and configured and synchronized with our
> main database separately (though you can use pg/python to access it
> from postgres).
>
> Good luck with your task :)
Yeah, I don't really have that sort of resources. This is a small hobby 
project (ie: no budget) that is growing a bit too large. I might just 
have to do text searches without time ordering.

On 7/21/2009 5:06, Scott Marlowe wrote:
> Couldn't you do tge second query as a with query then run another
> query to limit that result to everything greater than now()-xdays ?
>    
I suppose I could, but I have no way to do a fast query that does both a 
full text match and a < or > in the same WHERE due to the issue I 
described above, so my original plan won't work. A separate BTREE 
timestamp index obviously does nothing.

And again, thank you for all the help.

In response to

Responses

pgsql-performance by date

Next:From: Scott CareyDate: 2009-07-21 17:21:35
Subject: Re: hyperthreaded cpu still an issue in 8.4?
Previous:From: Scott MarloweDate: 2009-07-21 16:22:29
Subject: Re: hyperthreaded cpu still an issue in 8.4?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group