border case ::tsvector vs. to_tsvector was turning a tsvector without position in a weighted tsvector

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: postgresql Forums <pgsql-general(at)postgresql(dot)org>
Subject: border case ::tsvector vs. to_tsvector was turning a tsvector without position in a weighted tsvector
Date: 2010-02-09 10:18:42
Message-ID: 20100209111842.6f71ffc6@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This was what I was after:

test=# select version();
version
------------------------------------------------------------------------------------------------
PostgreSQL 8.3.9 on x86_64-pc-linux-gnu, compiled by GCC
gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2

test=# select to_tsvector('pino gino');
to_tsvector
-------------------
'gino':2 'pino':1
(1 row)

test=# select 'pino gino'::tsvector;
tsvector
---------------
'gino' 'pino'
(1 row)

test=# select to_tsvector('pino gino') @@ 'gino:B'::tsquery;
?column?
----------
f
(1 row)

test=# select to_tsvector('pino gino') @@ 'gino:D'::tsquery;
?column?
----------
t
(1 row)

test=# select ('pino gino'::tsvector) @@ 'gino:B'::tsquery;
?column?
----------
t
(1 row)

test=# select to_tsvector('pino:1B gino') @@ 'pino'::tsquery;
?column?
----------
t
(1 row)

test=# select 'pino gino'::tsvector || to_tsvector('pino gino');
?column?
-------------------
'gino':2 'pino':1
(1 row)

test=# select 'pino gino'::tsvector || 'pino gino'::tsvector;
?column?
---------------
'gino' 'pino'
(1 row)

test=# select to_tsvector('pino gino') || to_tsvector('pino gino');
?column?
-----------------------
'gino':2,4 'pino':1,3
(1 row)

test=# select 'pino gino'::tsvector || to_tsvector('gino tano');
?column?
--------------------------
'gino':1 'pino' 'tano':2

test=# select
setweight('pino gino'::tsvector || to_tsvector('gino tano'), 'A');
setweight
----------------------------
'gino':1A 'pino' 'tano':2A
(1 row)

So (even if it may sound obvious to many):
- tsvectors may be a mix of lexemes with and without weights
- a lexeme without a weight (=! default D weight) is a lexeme with
ALL weights
- you can't assign a weight to a lexeme without a position and it
would be hard to assign a position after a document is parsed into
a tsvector, so while in theory it could be reasonable to have
lexemes with weight and no position, in practice you'll have to
assign not meaningful positions if you'd like to assign a weight
to a tsvector with no positions.

I still wonder if it would be reasonable to write a function that
forcefully assign a position and a weight to vectors to be used with
ts_rank.
I've some ideas about possible use cases but I'm still unsure if they
are reasonable.
eg. someone would be willing to save storage and CPU cycles storing
part of documents in precomputed tsvectors with no weight and then
build up a search with merged tsvectors with weights using ts_rank.

OK.. trying to finish up my tsvector_to_tsquery function in a
reasonable way first.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Fredric Fredricson 2010-02-09 10:21:24 Re: Order by and strings
Previous Message Anton Maksimenkov 2010-02-09 10:18:10 Re: Memory Usage and OpenBSD