More fun with GIN lossy-page pointers

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: More fun with GIN lossy-page pointers
Date: 2010-07-31 13:57:13
Message-ID: 28767.1280584633@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I fixed the problem I was on about earlier with ginget.c doing the wrong
thing for keystreams like this:

... ...
42/6 42/65535
42/7 ...
...

(To recap, after reporting the potential lossy match for 42/6, the
previous code would advance both keystreams and thus fail to see
the potential lossy match for 42/7.) But in the shower this morning
I realized there was another case I had not considered:

... ...
42/6 42/7
50/1 42/65535
... ...

After deciding there is no possible match at 42/6, both the original
code and my patch will choose to advance the first keystream. This
means they will never compare 42/6 to the lossy pointer and recognize
there's a potential lossy match.

So far as I can see, it's impossible to handle this situation when
examining only one TID per stream with no lookahead. Choosing to
advance the second stream would obviously fail in many other cases,
so there is no correct action. The only reasonable way out is to
forbid the case --- that is, decree that a keystream may *not*
contain both lossy and nonlossy pointers to the same page.

Now it looks to me like there's no actual bug so far as keyGetItem is
concerned, because in fact that's how entryGetItem will behave: it can
only return a lossy pointer when a partial match bitmap has gone lossy,
and then the bitmap won't return any real pointers to that page.
However it *is* possible for the revised version of keyGetItem to return
such a keystream, thus breaking scanGetItem. The problematic case is
where the consistentFn has OR logic and we have a situation like

... ...
42/6 42/65535
50/1 ...
...

We'll correctly report 42/6 as a lossy result, but then the next
call will advance only the first keystream, and decide that it needs
to test nothing-and-42/65535 as a possible match. Given an OR query,
that'll succeed, and we'll return 42/65535 next, thus breaking the
rule that needs to hold for scanGetItem.

So this looks like a bit of a mess to resolve. I think what we
have to do is first see if the consistentFn will succeed when only
the lossy stream is TRUE, and if so report the whole-page pointer
as a lossy match (and advance over it as well as all the non-lossy
pointers for the same page on the next call). Otherwise continue
as now.

In any case the code comments are all wet because they state that
a keystream with both lossy and nonlossy pointers on the same page
is OK.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-07-31 15:00:19 Re: ANALYZE versus expression indexes with nondefault opckeytype
Previous Message Robert Haas 2010-07-31 12:42:44 Re: rbtree code breaks GIN's adherence to maintenance_work_mem