Re: Brain fade in gin_extract_jsonb_path()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, sdiz(at)sdiz(dot)net
Subject: Re: Brain fade in gin_extract_jsonb_path()
Date: 2015-11-05 21:35:12
Message-ID: 3346.1446759312@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Thu, Nov 5, 2015 at 12:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The attached one-liner fixes it, but of course this means that on-disk
>> jsonb_path_ops indexes are possibly broken and will need to be reindexed.
>> I see no way around that ... does anybody else?

> I think it's impossible to avoid having to recommend a reindex here.

Good, because that's not the only bug here :-(

Extending the example with

insert into t (j) values ('[[14,2,3]]');
insert into t (j) values ('[1,[14,2,3]]');

we get this with seqscan on:

regression=# select * from t where j @> '[[14]]';
j
-----------------
[[14, 2, 3]]
[1, [14, 2, 3]]
(2 rows)

which is the right answer, and this with seqscan off:

regression=# select * from t where j @> '[[14]]';
j
--------------
[[14, 2, 3]]
(1 row)

which is not so much.

The reason for that is that when processing an outer-level WJB_BEGIN_XXX,
we initialize stack->hash to something that is different from
stack->parent->hash. This is a waste of time in unnested cases because
when we hit the first WJB_KEY or WJB_ELEM, we'll reset stack->hash to
stack->parent->hash. However, if we have a nested array, then processing
of elements inside that array treats stack->parent->hash as the reference
value, so that we compute different hash values than we would have at
outer level. What's worse, it matters whether the nested array is the
first subobject or not, as shown by the above example; that's because once
we reset stack->hash to equal stack->parent->hash, hashes for later
elements will be inconsistent with the special-case initialization.
(Note: that explains why this is broken with my previous patch. It fails
in much the same way with unmodified HEAD, but I think for slightly
different reasons, which I've not bothered to understand in detail.)

I think the easiest way to fix this is to forget about the special
initialization at outer level and just always initialize a new stack level
to have the same hash as its parent. That leads to the first patch below
--- but once you look at that, you realize that we've got unnecessarily
many copies of stack->parent->hash into stack->hash, so what I actually
propose now is the second patch below.

I think that this only changes the hashes calculated for nested-array
cases, but it's still a larger change footprint than the previous patch.
Not that that affects much; a reindex is a reindex, and a bug is a bug,
so we don't have much wiggle room.

regards, tom lane

Attachment Content-Type Size
bug-13756-fix-2.patch text/x-diff 2.5 KB
bug-13756-fix-3.patch text/x-diff 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2015-11-05 21:40:14 Re: [PATCH] Skip ALTER x SET SCHEMA if the schema didn't change
Previous Message Robert Haas 2015-11-05 21:23:29 Re: proposal: multiple psql option -c