Skip site navigation (1) Skip section navigation (2)

Re: gsoc, text search selectivity and dllist enhancments

From: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gsoc, text search selectivity and dllist enhancments
Date: 2008-07-10 21:26:35
Message-ID: 48767E8B.9080903@students.mimuw.edu.pl (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
Tom Lane wrote:
> The way I think it ought to work is that the number of lexemes stored in
> the final pg_statistic entry is statistics_target times a constant
> (perhaps 100).  I don't like having it vary depending on tsvector width

I think the existing code puts at most statistics_target elements in a 
pg_statistic tuple. In compute_minimal_stats() num_mcv starts with 
stats->attr->attstattarget and is adjusted only downwards.
My original thought was to keep that property for tsvectors (i.e. store 
at most statistics_target lexemes) and advise people to set it high for 
their tsvector columns (e.g. 100x their default).
Also, the existing code decides which elements are worth storing as most 
common ones by discarding those that are not frequent enough (that's 
where num_mcv can get adjusted downwards). I mimicked that for lexemes 
but maybe it just doesn't make sense?

> But in any case, given a target number of lexemes to accumulate,
> I'd suggest pruning with that number as the bucket width (pruning
> distance).   Or perhaps use some multiple of the target number, but
> the number itself seems about right. 

Fine with me, I'm too tired to do the math now, so I'll take your word 
for it :)

Cheers,
Jan

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2008-07-10 22:09:13
Subject: Re: Adding variables for segment_size, wal_segment_size and block sizes
Previous:From: Radek StrnadDate: 2008-07-10 21:24:29
Subject: [WIP] collation support revisited (phase 1)

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group