Re: BUG #7793: tsearch_data thesaurus size limit

From: David Boutin <davios(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7793: tsearch_data thesaurus size limit
Date: 2013-01-11 22:54:11
Message-ID: CAAhHHEtUC8WJGw+vjCvU1Rc2uN1O9SESAM1gCeGZJeqOkQE8uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Ok thanks for your reply Tom.

We have made on our side some update on this file
src/backend/tsearch/dict_thesaurus.c
Then we recompile PG 9.2.2 with this patch and now the thesaurus works fine
with more than 64k entries and queries runtime is always low as expected.

Here is our update of the file:

typedef struct LexemeInfo
{
- *uint16* idsubst; /* entry's number in DictThesaurus->subst */
+ *uint32* idsubst; /* entry's number in DictThesaurus->subst */
uint16 posinsubst; /* pos info in entry */

...

static void
-newLexeme(DictThesaurus *d, char *b, char *e, *uint16* idsubst, uint16
posinsubst)
+newLexeme(DictThesaurus *d, char *b, char *e, *uint32* idsubst, uint16
posinsubst)
{
TheLexeme *ptr;

...

static void
-addWrd(DictThesaurus *d, char *b, char *e, *uint16* idsubst, uint16 nwrd,
uint16 posinsubst, bool useasis)
+addWrd(DictThesaurus *d, char *b, char *e, *uint32* idsubst, uint16 nwrd,
uint16 posinsubst, bool useasis)
{

...

thesaurusRead(char *filename, DictThesaurus *d) {
tsearch_readline_state trst;
- *uint16* idsubst = 0;
+ *uint32* idsubst = 0;
bool useasis = false;

...

static bool
-matchIdSubst(LexemeInfo *stored, *uint16* idsubst)
+matchIdSubst(LexemeInfo *stored, *uint32* idsubst)
{
bool res = true;

Kind regards.
David

On Mon, Jan 7, 2013 at 1:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> davios(at)gmail(dot)com writes:
> > [ thesaurus dictionary fails for more than 64K entries ]
>
> I see a whole bunch of uses of "uint16" in
> src/backend/tsearch/dict_thesaurus.c. It's not immediately clear which
> of these would need to be widened to support more entries, or what the
> storage cost of doing that would be. We probably should at least put in
> a range check so that you get a clean failure instead of a crash though.
>
> regards, tom lane
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Péter Kovács 2013-01-12 08:25:04 Re: [JDBC] BUG #7766: Running a DML statement that affects more than 4 billion rows results in an exception
Previous Message Dave Cramer 2013-01-11 18:03:19 Re: [JDBC] BUG #7766: Running a DML statement that affects more than 4 billion rows results in an exception