Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-30 17:08:02
Message-ID: 9144.1317402482@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
>> Isn't it possible to cache signature of newitem in gtrgm_penalty
>> like gtrgm_consistent do this for query?

> [ studies that code for awhile ... ] Ick, what a kluge.

> The main problem with that code is that the cache data gets leaked at
> the conclusion of a scan. Having just seen the consequences of leaking
> the "giststate", I think this is something we need to fix not emulate.

> I wonder whether it's worth having the GIST code create a special
> scan-lifespan (or insert-lifespan) memory context that could be used
> for cached data such as this? It's already creating a couple of
> contexts for its own purposes, so one more might not be a big problem.
> We'd have to figure out a way to make that context available to GIST
> support functions, though, as well as something cleaner than fn_extra
> for them to keep pointers in.

I've been chewing on this for awhile and am now thinking that maybe what
gtrgm_consistent is doing isn't that unreasonable. It's certainly
legitimate for it to use fn_extra to maintain state between calls.
The problem fundamentally is that when a function uses fn_extra to
reference data it keeps in the fn_mcxt context, there's an implicit
contract that fn_extra will survive for the same length of time that the
fn_mcxt context does. (Otherwise there's no way for the function to
avoid leaking that data after it's been called for the last time using
that FmgrInfo.) And GIST is violating that assumption: it resets
fn_extra when it does initGISTstate, but fn_mcxt gets set to
CurrentMemoryContext, which in the problematic cases is a query-lifespan
context that will still be around after the GIST indexscan is concluded.

So what I'm thinking we ought to do is redefine things so that
initGISTstate sets fn_mcxt to a context that has the same lifespan as
the GISTSTATE itself does. We could possibly eliminate a few retail
pfree's in the process, eg by keeping the GISTSTATE itself in that same
context.

Having done that, what gtrgm_consistent is doing would be an officially
supported (dare I suggest even documented?) thing instead of a kluge,
and we could then adopt the same methodology in gtrgm_penalty.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jamie Fox 2011-09-30 18:47:37 Re: Mismatch of relation names: pg_toast.pg_toast_nnn during pg_upgrade from 8.4 to 9.1
Previous Message Noah Misch 2011-09-30 11:59:18 Re: [v9.2] Fix Leaky View Problem