Re: Memory bug in dsnowball_lexize

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory bug in dsnowball_lexize
Date: 2019-05-23 16:02:01
Message-ID: CAE-h2Tq8XpyxPfUhkh=uv3Q8S3Z9VZz=E4m4rhTQGRyEzXhqkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Mark Dilger <hornschnorter(at)gmail(dot)com> writes:
> > In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
> > malloc (not palloc) to allocate memory, and on memory exhaustion
> > returns NULL rather than throwing an exception.
>
> Actually not, see macros in src/include/snowball/header.h.

You are correct. Thanks for the pointer.

> > In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
> > calls 'SN_set_current' and ignores the return value, thereby
> > failing to notice the error, if any.
>
> Hm. This seems like possibly a bug, in that even if we cover the
> malloc issue, there's no API guarantee that OOM is the only possible
> reason for reporting failure.

Ok, that sounds fair. Since the memory is being palloc'd, I suppose
it would be safe to just ereport when the return value is -1?

> > There is a comment higher up in dict_snowball.c that seems to
> > use some handwaving about all this, or perhaps it is documenting
> > something else entirely. In any event, I find the documentation
> > about dictCtx insufficient to explain why this memory handling
> > is correct.
>
> Fair complaint --- do you want to propose some new wording that
> references what header.h does?

Perhaps something along these lines?

/*
- * snowball saves alloced memory between calls, so we should
run it in our
- * private memory context. Note, init function is executed in long lived
- * context, so we just remember CurrentMemoryContext
+ * snowball saves alloced memory between calls, which we force to be
+ * allocated using palloc and friends via preprocessing macros (see
+ * snowball/header.h), so we should run snowball in our private memory
+ * context. Note, init function is executed in long lived
context, so we
+ * just remember CurrentMemoryContext.
*/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-23 16:02:18 Re: Why could GEQO produce plans with lower costs than the standard_join_search?
Previous Message Andres Freund 2019-05-23 16:00:23 Re: Minimal logical decoding on standbys