Quick Links

Re: Term positions in GIN fulltext index

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Yoann Moreau <yoann(dot)moreau(at)univ-avignon(dot)fr>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Term positions in GIN fulltext index
Date:	2011-11-04 11:15:56
Message-ID:	25F8CB23-35D6-481A-8AC6-F8396838D7C8@phlo.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Nov4, 2011, at 11:15 , Yoann Moreau wrote:
> On 03/11/11 19:19, Florian Pflug wrote:
>> Postgres doesn't seem to contain such a function currently (don't believe that,
>> though - go and recheck the documentation. I don't know all thousands of built-in
>> functions by heart). But it's easy to add one. You could either use PL/pgSQL
>> to parse the tsvector's textual representation, or write a C function. If you
>> go the PL/pgSQL route, regexp_split_to_table() might come in handy.
>
> This seems easier to program than what I was thinking about, I'm going to do that.
> But I'm wondering about size of database with the GIN index plus the tsvector column,
> and performance about parsing the whole tsvectors for each document I need positions
> from (as I need them for a very few terms).

AFAICS, the internal storage layout of tsvector should allow you to extract an
individual lexem's positions quite efficiently (with time complexity log(N) where
N is the number of lexems in the tsvector). Doing so will require you to implement
your function in C though - any solution that works from a tsvector's textual
representation will obviously have time complexity N.

best regards,
Florian Pflug

In response to

Re: Term positions in GIN fulltext index at 2011-11-04 10:15:15 from Yoann Moreau

Responses

Re: Term positions in GIN fulltext index at 2011-11-04 14:26:01 from Yoann Moreau

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2011-11-04 11:34:33	Re: psql expanded auto
Previous Message	Simon Riggs	2011-11-04 11:14:10	Re: DeArchiver process