Skip site navigation (1) Skip section navigation (2)

Re: tsvector pg_stats seems quite a bit off.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector pg_stats seems quite a bit off.
Date: 2010-05-30 22:07:28
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(at)wulczer(dot)org> writes:
> Here's a patch against recent git, but should apply to 8.4 sources as
> well. It would be interesting to measure the memory and time needed to
> analyse the table after applying it, because we will be now using a lot
> bigger bucket size and I haven't done any performance impact testing on
> it.

I did a little bit of testing using a dataset I had handy (a couple
hundred thousand publication titles) and found that ANALYZE seems to be
noticeably but far from intolerably slower --- it's almost the same
speed at statistics targets up to 100, and even at the max setting of
10000 it's only maybe 25% slower.  However I'm not sure if this result
will scale to very large document sets, so more testing would be a good

I committed the attached revised version of the patch.  Revisions are
mostly minor but I did make two substantive changes:

* The patch changed the target number of mcelems from 10 *
statistics_target to just statistics_target.  I reverted that since
I don't think it was intended; at least we hadn't discussed it.

* I modified the final processing to avoid one qsort step if there are
fewer than num_mcelems hashtable entries that pass the cutoff frequency
filter, and in any case to sort only those entries that pass it rather
than all of them.  With the significantly larger number of hashtable
entries that will now be used, it seemed like a good thing to try to
cut the qsort overhead.

			regards, tom lane

Attachment: ts-typanalyze-fix-2.patch
Description: text/x-patch (11.0 KB)

In response to


pgsql-hackers by date

Next:From: Jan UrbańskiDate: 2010-05-30 22:24:59
Subject: Re: tsvector pg_stats seems quite a bit off.
Previous:From: Andres FreundDate: 2010-05-30 20:48:37
Subject: Re: Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group