Re: Question about RUM-index

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andreas Joseph Krogh <andreas(at)visena(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question about RUM-index
Date: 2016-06-15 13:27:32
Message-ID: CAKFQuwZMd-K1Dt4owady6=j9ynNOnU3wZ87sgRq4Zu9t9SpDOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jun 15, 2016 at 6:56 AM, Andreas Joseph Krogh <andreas(at)visena(dot)com>
wrote:

> Hi.
>
> First; Is this the correct forum to ask questions about the Postgres Pro's
> new RUM-index?
>

​-general is fairly broad in scope and the authors do monitor here as far
as I am aware. I'm not sure this is strictly related to that extension
though, anyway.​

>
> If not, please point me to the right forum.
>
> I'm trying to do this (done with GIN):
>
>
> create extension if not exists btree_gin;drop table if EXISTS delivery;
> create table delivery(
> id BIGSERIAL primary key,
> fts_all TSVECTOR not null,
> folder_id BIGINT NOT NULL,
> sent TIMESTAMP not null,
> message varchar not null);
>
> create index gin_idx on delivery using GIN(fts_all, folder_id);
>
> ​In general I'm not sure what you are so focused on multi-column indexes
through this email.​

>
> CREATE OR REPLACE FUNCTION update_delivery_tsvector_tf() RETURNS TRIGGER AS $$BEGIN NEW.fts_all = to_tsvector('simple', NEW.message); return NEW;END;$$ LANGUAGE PLPGSQL;
> CREATE TRIGGER update_delivery_tsvector_t BEFORE INSERT OR UPDATE ON deliveryFOR EACH ROW EXECUTE PROCEDURE update_delivery_tsvector_tf();
> insert into delivery(folder_id, sent, message)values (1, '2015-01-01', 'Yes hit four')
> , (1, '2014-01-01', 'Hi man')
> , (2, '2013-01-01', 'Hi man')
> , (2, '2013-01-01', 'fish')
> ;
> analyze delivery;
> set ENABLE_SEQSCAN to off;
> explain analyze SELECT del.id
> , del.sent
> FROM delivery del
> WHERE 1 = 1 AND del.fts_all @@ to_tsquery('simple', 'hi:*')
> AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[]) ORDER BY del.sent DESC LIMIT 101 OFFSET 0;
>
>
> QUERY PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------
> Limit (cost=3.63..3.63 rows=1 width=16) (actual time=0.025..0.025 rows=1
> loops=1)
> -> Sort (cost=3.63..3.63 rows=1 width=16) (actual time=0.024..0.024
> rows=1 loops=1)
> Sort Key: sent DESC
> Sort Method: quicksort Memory: 25kB
> -> Bitmap Heap Scan on delivery del (cost=2.40..3.62 rows=1
> width=16) (actual time=0.019..0.019 rows=1 loops=1)
> Recheck Cond: ((fts_all @@ '''hi'':*'::tsquery) AND
> (folder_id = ANY ('{2,3}'::bigint[])))
> Heap Blocks: exact=1
> -> Bitmap Index Scan on gin_idx (cost=0.00..2.40 rows=1
> width=0) (actual time=0.015..0.015 rows=1 loops=1)
> Index Cond: ((fts_all @@ '''hi'':*'::tsquery) AND
> (folder_id = ANY ('{2,3}'::bigint[])))
> Planning time: 0.153 ms
> Execution time: 0.047 ms
> (11 rows)
>
>
> Note that GIN does almost what I want, except use the index when sorting
> by "sent"-timestamp.
>

​What index? You don't have "sent" in the one CREATE INDEX statement you
specified above.​

>
> So I wonder if RUM can do any better?
> What I don't understand is how to have "folder_id" as part of the
> RUM-index so that I can search in *an array* of folders using the index,
> *AND* have the whole result sorted by "sent"-timestamp also using the
> RUM-index.
>
>

​A bit out of my experience here...but see comment on multi-column indexes
above.​

> In the (limited) documentation sorting using timestamp is done like this:
>

>
> ORDER BY sent <-> '2000-01-01'::TIMESTAMP
>
> which I don't understand; Why must one specify a value here, and how does
> that value affect the result?
>
>
​You don't (I think...); <ORDER BY sent> is perfectly valid, it will use
the "<=, and >=" operators in the btree operator family to perform the
ordering...see comment regarding multi-column indexes. Even if you stick
sent into such an index it would not (I guess it could...but your where
clause doesn't filter on it so doing so seem counter-productive) be a
leading column so it is unlikely that the index would be helpful in
supplying tuples to the sort node in order - the sort would still have work
to perform.

Maybe its simple ignorance but AFAIK the sorting node never consults an
index to perform its work. Where indexes come into to play is the node
under the sort is able to supply its tuples in sorted order then the sort
node will have nothing to do.

https://www.postgresql.org/docs/9.6/static/indexes-ordering.html

"""
In addition to simply finding the rows to be returned by a query, an index
may be able to deliver them in a specific sorted order. This allows a
query's ORDER BY specification to be honored without a separate sorting
step. Of the index types currently supported by PostgreSQL, only B-tree can
produce sorted output — the other index types return matching rows in an
unspecified, implementation-dependent order.
​"""​

​HTH

David J.​

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2016-06-15 13:47:39 Re: Changelog version from 8.1.2 to 9.3.6
Previous Message David G. Johnston 2016-06-15 13:00:46 Re: regarding schema only migration from sqlserver to postgres with runmtk.sh