Re: Question about RUM-index

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Question about RUM-index
Date: 2016-06-15 16:15:48
Message-ID: VisenaEmail.d.c1604888ec47393b.1555441dabe@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

På onsdag 15. juni 2016 kl. 15:27:32, skrev David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com <mailto:david(dot)g(dot)johnston(at)gmail(dot)com>>:
On Wed, Jun 15, 2016 at 6:56 AM, Andreas Joseph Krogh <andreas(at)visena(dot)com
<mailto:andreas(at)visena(dot)com>> wrote:
Hi.
 
First; Is this the correct forum to ask questions about the Postgres Pro's new
RUM-index?
 
​-general is fairly broad in scope and the authors do monitor here as far as I
am aware.  I'm not sure this is strictly related to that extension though,
anyway.​

 
 
If not, please point me to the right forum.
 
I'm trying to do this (done with GIN):
 
create extension if not exists btree_gin; drop table if EXISTS delivery;
create tabledelivery( id BIGSERIAL primary key, fts_all TSVECTOR not null,
folder_idBIGINT NOT NULL, sent TIMESTAMP not null, message varchar not null );
create index gin_idx on delivery using GIN(fts_all, folder_id);
​In general I'm not sure what you are so focused on multi-column indexes
through this email.​
[snip]
 
​What index?  You don't have "sent" in the one CREATE INDEX statement you
specified above.​

 
 
So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so
that I can search inan array of folders using the index, AND have the whole
result sorted by "sent"-timestamp also using the RUM-index.
 
 
​A bit out of my experience here...but see comment on multi-column indexes
above.​

 
In the (limited) documentation sorting using timestamp is done like this:
 
 
 
 
ORDER BY sent <-> '2000-01-01'::TIMESTAMP
 
which I don't understand; Why must one specify a value here, and how does that
value affect the result?
 
 
​You don't (I think...); <ORDER BY sent> is perfectly valid, it will use the
"<=, and >=" operators in the btree operator family to perform the
ordering...see comment regarding multi-column indexes.  Even if you stick sent
into such an index it would not (I guess it could...but your where clause
doesn't filter on it so doing so seem counter-productive) be a leading column
so it is unlikely that the index would be helpful in supplying tuples to the
sort node in order - the sort would still have work to perform.

 

Maybe its simple ignorance but AFAIK the sorting node never consults an index
to perform its work.  Where indexes come into to play is the node under the
sort is able to supply its tuples in sorted order then the sort node will have
nothing to do.

 

https://www.postgresql.org/docs/9.6/static/indexes-ordering.html
<https://www.postgresql.org/docs/9.6/static/indexes-ordering.html>

 

"""

In addition to simply finding the rows to be returned by a query, an index may
be able to deliver them in a specific sorted order. This allows a query's ORDER
BY specification to be honored without a separate sorting step. Of the index
types currently supported by PostgreSQL, only B-tree can produce sorted output
— the other index types return matching rows in an unspecified,
implementation-dependent order.

​"""​

​HTH
 
David J.​

 
I'm focused on using *one* index for this query for this to be as efficient as
possible. I'm searching thru millions of records and the user expects
"millisecond-response", we're building an web-based email-system and people
expect search to be fast, as GMail and other services are. No index-type can do
this yet but RUM is on its way and seems promising.
 
The reason I don't have "sent" in the (GIN) index is that I'm only using it
for sorting and GIN cannot be used for sorting, which is why I'm looking into
using the RUM-index; To hopefully get the best of GIN (able to index
BIGINT-arrays, JSONB and tsvector) and be able to sort on timestamp (which RUM
is able to do).
 
Based on your answer it doesn't seem you are very familiar with the
RUM-extention, so much of your reply doesn't really seem relevant.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

 

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2016-06-15 16:17:58 Re: Looking for a good programming reference
Previous Message Melvin Davidson 2016-06-15 16:13:08 Re: Looking for a good programming reference