|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>|
|Cc:||Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: gsoc, oprrest function for text search take 2|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> writes:
> Pre-sorting introduced one problem (see XXX in code): it's not easy
> anymore to get the minimal frequency of MCELEM values. I was using it to
> assert that the selectivity of a tsquery node containing a lexeme not in
> MCELEM is no more that min(MCELEM freqs) / 2. That's only significant
> when the minimum frequency is less than DEFAULT_TS_SEL * 2, so I'm kind
> of inclined to ignore it and maybe drop a comment in the code that this
> may be a potential problem.
This is easily fixed: there is nothing saying that a pg_statistic slot's
contents must contain the same numbers of Values and Numbers. Make the
numbers array have one extra element and store the min frequency there.
Maybe it'd be worth having 2 extra elements and dropping the max in,
as well. I don't immediately have a use for it, but it'll be a lot
harder to add it later if we don't put it in now.
> If nothing is fundamentally broken with this, I'll repeat my profiling
> tests to see if anything has been gained.
I don't have much except minor stylistic gripes (like the ordering of
the functions in ts_selfuncs.c seeming a bit random). One possibly
performance-relevant point is to use DatumGetTextPP for detoasting;
you've already paid the costs by using VARDATA_ANY etc, so you might
as well get the benefit.
Please fix the above and do the performance testing ...
regards, tom lane
|Next Message||Stephen Frost||2008-09-02 22:18:23||Re: WIP: Column-level Privileges|
|Previous Message||Alvaro Herrera||2008-09-02 21:58:01||Re: Feature request: better debug messages|