Skip site navigation (1) Skip section navigation (2)

Re: tsvector concatenation - backend crash

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jesper(at)krogh(dot)cc
Cc: pgsql-hackers(at)postgresql(dot)org, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Subject: Re: tsvector concatenation - backend crash
Date: 2011-08-26 21:02:35
Message-ID: 5397.1314392555@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
jesper(at)krogh(dot)cc writes:
>> Attached SQL files gives (at least in my hands) a reliable backend crash
>> with this stacktrace .. reproduced on both 9.0.4 and HEAD. I'm sorry
>> I cannot provide a more trimmed down set of vectors the reproduces the
>> bug, thus
>> the "obsfucated" dataset. But even deleting single terms in the vectors
>> make the bug go away.

I found it.  tsvector_concat does this to compute the worst-case output
size needed:

	/* conservative estimate of space needed */
	out = (TSVector) palloc0(VARSIZE(in1) + VARSIZE(in2));

Unfortunately, that's not really worst case: it could be that the output
will require more alignment padding bytes than the inputs did, if there
is a mix of lexemes with and without position data.  For example, if in1
contains one lexeme of odd length without position data, and in2
contains one lexeme of even length with position data (and no pad byte),
and in1's lexeme sorts before in2's, then we will need a pad byte in the
second lexeme where there was none before.

The core of the fix is to suppose that we might need a newly-added pad
byte for each lexeme:

	out = (TSVector) palloc0(VARSIZE(in1) + VARSIZE(in2) + i1 + i2);

which really is an overestimate but I don't feel a need to be tenser
about it.  What I actually committed is a bit longer because I added
some comments and some Asserts ...

> Ok, I found 8.3.0 to be "good" so i ran a git bisect on it.. it gave
> me this commit:
> 
> e6dbcb72fafa4031c73cc914e829a6dec96ab6b6 is the first bad commit
> commit e6dbcb72fafa4031c73cc914e829a6dec96ab6b6
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date:   Fri May 16 16:31:02 2008 +0000
> 
>     Extend GIN to support partial-match searches, and extend tsquery to
> support
>     prefix matching using this facility.

AFAICT this is a red herring: the bug exists all the way back to where
tsvector_concat was added, in 8.3.  I think the reason that your test
case happens to not crash before this commit is that it changed the sort
ordering rules for lexemes.  As you can see from my minimal example
above, we might need different numbers of pad bytes depending on how the
lexemes sort relative to each other.

Anyway, patch is committed; thanks for the report!

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2011-08-26 21:11:08
Subject: Re: pg_restore --no-post-data and --post-data-only
Previous:From: Daniel FarinaDate: 2011-08-26 20:57:29
Subject: Cryptic error message in low-memory conditions

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group