Skip site navigation (1) Skip section navigation (2)

Re: gsoc, oprrest function for text search take 2

From: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gsoc, oprrest function for text search take 2
Date: 2008-08-14 19:45:18
Message-ID: 48A48B4E.10803@students.mimuw.edu.pl (view raw or flat)
Thread:
Lists: pgsql-hackers
Heikki Linnakangas wrote:
> Jan Urbański wrote:
>> So right now the idea is to:
>>  (1) pre-sort STATISTIC_KIND_MCELEM values
>>  (2) build an array of pointers to detoasted values in tssel()
>>  (3) use binary search when looking for MCELEMs during tsquery analysis
> 
> Sounds like a plan. In (2), it's even better to detoast the values 
> lazily. For a typical one-word tsquery, the binary search will only look 
> at a small portion of the elements.

Here's another version.

Most common lexemes get sorted before storing in pg_statistic. The 
ordering is on length first and value second. That way we can avoid 
strncmp() calls when the lexemes have different lengths (and lexemes 
know their lengths, so the data is readily available). Also, in the 
binary search routine during selectivity estimation we can sometimes 
avoid detoasting (I think) Datums from the pg_statistic MCELEM array. 
See comments in code.

Pre-sorting introduced one problem (see XXX in code): it's not easy 
anymore to get the minimal frequency of MCELEM values. I was using it to 
assert that the selectivity of a tsquery node containing a lexeme not in 
MCELEM is no more that min(MCELEM freqs) / 2. That's only significant 
when the minimum frequency is less than DEFAULT_TS_SEL * 2, so I'm kind 
of inclined to ignore it and maybe drop a comment in the code that this 
may be a potential problem.

If nothing is fundamentally broken with this, I'll repeat my profiling 
tests to see if anything has been gained.

Cheers,
Jan

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Attachment: tssel-gsoc08-tss-v3.patch
Description: text/plain (16.2 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Jan UrbańskiDate: 2008-08-14 20:27:55
Subject: Re: gsoc, oprrest function for text search take 2
Previous:From: Tom LaneDate: 2008-08-14 19:04:35
Subject: Re: WIP: patch to create explicit support for semi and anti joins

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group