Skip site navigation (1) Skip section navigation (2)

Re: Full text search with ORDER BY performance issue

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Krade <krade(at)krade(dot)com>
Cc: valgog <valgog(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Full text search with ORDER BY performance issue
Date: 2009-07-21 19:27:28
Message-ID: Pine.LNX.4.64.0907212326060.8065@sn.sai.msu.ru (view raw or flat)
Thread:
Lists: pgsql-performance
On Tue, 21 Jul 2009, Krade wrote:

> On 7/21/2009 11:32, valgog wrote:
>> Hi,
>> 
>> There is a problem with GIN and GIST indexes, that they cannot be used
>> by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it
>> possible to use the b-tree columns in GIST or GIN to make the sort
>> easier, but I have no idea how difficult it will be to implement it in
>> current GIN or GIST structures. I think Oleg or even Tom will be the
>> right people to ask it :) But even if it is possible it will not be
>> implemented at least until 8.5 that will need a year to come, so until
>> then...
>> 
> Unfortunately, it's not even just the lack of ORDER BY support, btree_gin 
> indexes seem to be broken under some circumstances. So I can't even use my 
> idea to limit searches to the last 10 days.
>
> See this:
> http://pgsql.privatepaste.com/5219TutUMk
>
> The first query gives bogus results. It's not using the index correctly.
>
> timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The timestamp 
> column is an integer. The queries work right if I drop the index. Is this a 
> bug in btree_gin?

it'd be nice if you provide us data,so we can reproduce your problem

>> It is possible to strip your table in several smaller ones putting
>> them on different machines and then splitting your query with DBLINK.
>> This will distribute the burden of sorting to several machines that
>> will have to sort smaller parts as well. After you have your 25 ids
>> from each of the machines, you can merge them, sort again and limit as
>> you wish. Doing large offsets will be still problematic but faster
>> anyway in most reasonable offset ranges. (Load balancing tools like
>> pg_pool can automate this task, but I do not have practical experience
>> using them for that purposes)
>> 
>> Yet another very interesting technology -- sphinx search (http://
>> www.sphinxsearch.com/). It can distribute data on several machines
>> automatically, but it will be probably too expensive to start using
>> (if your task is not your main one :)) as they do not have standard
>> automation scripts, it does not support live updates (so you will
>> always have some minutes delay), and this is a standalone service,
>> that needs to be maintained and configured and synchronized with our
>> main database separately (though you can use pg/python to access it
>> from postgres).
>> 
>> Good luck with your task :)
> Yeah, I don't really have that sort of resources. This is a small hobby 
> project (ie: no budget) that is growing a bit too large. I might just have to 
> do text searches without time ordering.
>
> On 7/21/2009 5:06, Scott Marlowe wrote:
>> Couldn't you do tge second query as a with query then run another
>> query to limit that result to everything greater than now()-xdays ?
>> 
> I suppose I could, but I have no way to do a fast query that does both a full 
> text match and a < or > in the same WHERE due to the issue I described above, 
> so my original plan won't work. A separate BTREE timestamp index obviously 
> does nothing.
>
> And again, thank you for all the help.
>
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

pgsql-performance by date

Next:From: Victor de Buen (Bayes)Date: 2009-07-21 23:43:35
Subject: Atomic access to large arrays
Previous:From: Merlin MoncureDate: 2009-07-21 18:34:00
Subject: Re: Used computers?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group