Quick Links

Re: gsoc, oprrest function for text search take 2

From:	Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: gsoc, oprrest function for text search take 2
Date:	2008-08-14 19:45:18
Message-ID:	48A48B4E.10803@students.mimuw.edu.pl
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Jan Urbański wrote:
>> So right now the idea is to:
>> (1) pre-sort STATISTIC_KIND_MCELEM values
>> (2) build an array of pointers to detoasted values in tssel()
>> (3) use binary search when looking for MCELEMs during tsquery analysis
>
> Sounds like a plan. In (2), it's even better to detoast the values
> lazily. For a typical one-word tsquery, the binary search will only look
> at a small portion of the elements.

Here's another version.

Most common lexemes get sorted before storing in pg_statistic. The
ordering is on length first and value second. That way we can avoid
strncmp() calls when the lexemes have different lengths (and lexemes
know their lengths, so the data is readily available). Also, in the
binary search routine during selectivity estimation we can sometimes
avoid detoasting (I think) Datums from the pg_statistic MCELEM array.
See comments in code.

Pre-sorting introduced one problem (see XXX in code): it's not easy
anymore to get the minimal frequency of MCELEM values. I was using it to
assert that the selectivity of a tsquery node containing a lexeme not in
MCELEM is no more that min(MCELEM freqs) / 2. That's only significant
when the minimum frequency is less than DEFAULT_TS_SEL * 2, so I'm kind
of inclined to ignore it and maybe drop a comment in the code that this
may be a potential problem.

If nothing is fundamentally broken with this, I'll repeat my profiling
tests to see if anything has been gained.

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Attachment	Content-Type	Size
tssel-gsoc08-tss-v3.patch	text/plain	16.2 KB

In response to

Re: gsoc, oprrest function for text search take 2 at 2008-08-14 10:03:54 from Heikki Linnakangas

Responses

Re: gsoc, oprrest function for text search take 2 at 2008-08-14 20:27:55 from Jan Urbański
Re: gsoc, oprrest function for text search take 2 at 2008-09-02 22:09:03 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jan Urbański	2008-08-14 20:27:55	Re: gsoc, oprrest function for text search take 2
Previous Message	Tom Lane	2008-08-14 19:04:35	Re: WIP: patch to create explicit support for semi and anti joins