Skip site navigation (1) Skip section navigation (2)

Re: tsvector pg_stats seems quite a bit off.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector pg_stats seems quite a bit off.
Date: 2010-05-30 14:24:47
Message-ID: 5447.1275229487@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Jesper Krogh <jesper(at)krogh(dot)cc> writes:
> On 2010-05-29 15:56, Jan Urbaski wrote:
>> AFAIK statistics for everything other than tsvectors are built based on
>> the values of whole rows.

> Wouldn't it make sense to treat array types like the tsvectors?

Yeah, I have a personal TODO item to look into that in the future.

>> The results are attached in a text (CSV) file, to preserve formatting.
>> Based on them I'd like to propose top_stopwords and error_factor to be 100.

> I know it is not percieved the correct way to do things, but I would
> really like to keep the "stop words" in the dataset and have
> something that is robust to that.

Any stop words would already have been eliminated in the transformation
to tsvector (or not, if none were configured in the dictionary setup).
We should not assume that there are any in what ts_typanalyze is seeing.

I think the only relevance of stopwords to the current problem is that
*if* stopwords have been removed, we would see a Zipfian distribution
with the first few entries removed, and I'm not sure if it's still
really Zipfian afterwards.  However, we only need the assumption of
Zipfianness to compute a target frequency cutoff, so it's not like
things will be completely broken if the distribution isn't quite
Zipfian.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Jan UrbańskiDate: 2010-05-30 14:41:40
Subject: Re: tsvector pg_stats seems quite a bit off.
Previous:From: Tom LaneDate: 2010-05-30 14:19:06
Subject: Re: Is there anyway to get list of table name, before raw parser is analyze?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group