Re: BUG #15600: ts_stat's nentry maxes out at 255

From: Christoph Gößmann <mail(at)goessmann(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15600: ts_stat's nentry maxes out at 255
Date: 2019-01-21 18:38:49
Message-ID: B77BE8F3-7F28-44E9-8571-A0F4979CAED5@goessmann.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

That information is very helpful, thanks. I've tried verifying directly with the to_tsvector() function and could see that you are right about no more than 255 locations being saved in the vector. So it makes sense adding that to the documentation, otherwise people with large documents will obtain misleading or wrong numbers.

> On 21. Jan 2019, at 18:31, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> =?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
>> Unexpected behaviour:
>> netry for 'hello' results in 255 despite 'hello' occurs 539 times in the
>> attached test.
>
> I think this is a consequence of the MAXNUMPOS limitation in the source
> code, ie an individual tsvector won't store more than 255 locations for
> the same word. That's intentional to keep common words from bloating
> tsvectors too much. But if it's documented anywhere, I didn't see it.
>
> regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-01-21 20:08:43 BUG #15601: ERRO TYPE SERIAL
Previous Message Tom Lane 2019-01-21 17:31:02 Re: BUG #15600: ts_stat's nentry maxes out at 255