Memory bug in dsnowball_lexize

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Memory bug in dsnowball_lexize
Date: 2019-05-23 15:14:24
Message-ID: CAE-h2TrW-5ocMg8ma_0iUcqnD6n8qN9JJ+sAqp=dN2oYjaKdDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
malloc (not palloc) to allocate memory, and on memory exhaustion
returns NULL rather than throwing an exception. In this same
file, 'replace_s' calls 'create_s' and if it gets back NULL, returns
the error code -1. Otherwise, it sets z->p to the allocated
memory.

In src/backend/snowball/libstemmer/api.c, 'SN_set_current' calls
'replace_s' and returns whatever 'replace_s' returned, which in
the case of memory exhaustion will be -1.

In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.

I checked one of the stemmers, stem_ISO_8859_1_english.c,
and it treats z->p as an array without checking whether it is
NULL. This will crash the backend in the above error case.

There is something else weird here, though. The call to
'SN_set_current' is wrapped in a memory context switch, along
with a call to the stemmer, as if the caller expects any allocated
memory to be palloc'd, which it is not, given the underlying code's
use of malloc and calloc.

There is a comment higher up in dict_snowball.c that seems to
use some handwaving about all this, or perhaps it is documenting
something else entirely. In any event, I find the documentation
about dictCtx insufficient to explain why this memory handling
is correct.

mark

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-05-23 15:23:14 Re: refactoring - share str2*int64 functions
Previous Message Fabien COELHO 2019-05-23 15:11:20 RE: psql - add SHOW_ALL_RESULTS option