From: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> |
---|---|
To: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Subject: | gsoc08, text search selectivity, pg_statistics holding an array of a different type |
Date: | 2008-05-09 18:17:22 |
Message-ID: | 48249532.4050705@students.mimuw.edu.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, hackers.
I've been fooling around my GSoC project, and here's the first version
I'm not actually ashamed of showing.
There's one fundamental problem I came across while writing a typanalyze
function for tsvectors.
update_attstats() constructs an array that's later inserted into the
appropriate stavaluesN for a given relation attribute. However, it
assumes that the elements of that array will be of the same type as
their corresponding attribute.
It is no longer true with the design that I planned to use. The
typanalyze function for the tsvector type returns an array of
most-frequent lexemes (cstrings actually) from the tsvectors, not an
array of tsvectors. The question is: is this approach OK? Should
typanalyze functions be able to communicate the type of their result to
analyze_rel() ? I'm thinking of extending the VacAttrStats structure, so
a typanalyze func could set the proper fields to the proper values.
The problem is currently worked-around by brute force - I just wanted to
get it working.
The patch as-is makes ANALYZE store the most-frequent lexemes from
tsvectors in pg_statistics and passes all regression tests. It's of
course WIP (yes, throwing NOTICEs all over the place isn't my ultimate
goal), but the XXXs are things I'm really not sure how to implement. Any
comment on them would be appreciated.
You can also browse to
http://git.postgresql.org/?p=~wulczer/gsoc08-tss.git;a=summary or clone
git://git.postgresql.org/git/~wulczer/gsoc08-tss.git, if you're
interested in the progress.
Cheers,
Jan
PS: should I be posting this to -patches, as it has a patch? I figured
no, because it's not something meant to be applied, just a convenient
way of showing what's it all about.
--
Jan Urbanski
GPG key ID: E583D7D2
ouden estin
Attachment | Content-Type | Size |
---|---|---|
gsoc08-tss-typanalyze.diff | text/plain | 14.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Brendan Jurd | 2008-05-09 18:31:31 | Re: psql wrapped format default for backslash-d commands |
Previous Message | Bruce Momjian | 2008-05-09 17:52:24 | psql wrapped format default for backslash-d commands |