Re: [HACKERS] Surjective functional indexes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Christoph Berg <myon(at)debian(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Surjective functional indexes
Date: 2019-01-14 22:34:09
Message-ID: 20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-11-07 14:25:54 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Fri, May 11, 2018 at 11:01 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >>>> I have no problem if you want to replace this with an even better
> >>>> design in a later release.
>
> >>> Meh. The author / committer should get a patch into the right shape
>
> >> They have done, at length. Claiming otherwise is just trash talk.
>
> > Some people might call it "honest disagreement".
>
> So where we are today is that this patch was lobotomized by commits
> 77366d90f/d06fe6ce2 as a result of this bug report:
>
> https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql
>
> We need to move forward, either by undertaking a more extensive
> clean-out, or by finding a path to a version of the code that is
> satisfactory. I wanted to enumerate my concerns while yesterday's
> events are still fresh in mind. (Andres or Robert might have more.)
>
> * I do not understand why this feature is on-by-default in the first
> place. It can only be a win for expression indexes that are many-to-one
> mappings; for indexes that are one-to-one or few-to-one, it's a pretty
> big loss. I see no reason to assume that most expression indexes fall
> into the first category. I suggest that the design ought to be to use
> this optimization only for indexes for which the user has explicitly
> enabled recheck_on_update. That would allow getting rid of the cost check
> in IsProjectionFunctionalIndex, about which we'd otherwise have to have
> an additional fight: I do not like its ad-hoc-ness, nor the modularity
> violation (and potential circularity) involved in having the relcache call
> cost_qual_eval.
>
> * Having heapam.c calling the executor also seems like a
> modularity/layering violation that will bite us in the future.
>
> * The implementation seems incredibly inefficient. ProjIndexIsUnchanged
> is doing creation/destruction of an EState, index_open/index_close
> (including acquisition of a lock we should already have), BuildIndexInfo,
> and expression compilation, all of which are completely redundant with
> work done elsewhere in the executor. And it's doing that *over again for
> every tuple*, which totally aside from wasted cycles probably means there
> are large query-lifespan memory leaks in an UPDATE affecting many rows.
> I think a minimum expectation should be that one-time work is done only
> one time; but ideally none of those things would happen at all because
> we could share work with the regular code path. Perhaps it's too much
> to hope that we could also avoid duplicate computation of the new index
> expression values, but as long as I'm listing complaints ...
>
> * As noted in the bug thread, the implementation of the new reloption
> is broken because (a) it fails to work for some built-in index AMs
> and (b) by design, it can't work for add-on AMs. I agree with Andres
> that that's not acceptable.
>
> * This seems like bad data structure design:
>
> + Bitmapset *rd_projidx; /* Oids of projection indexes */
>
> That comment is a lie, although IMO it'd be better if it were true;
> a list of target index OIDs would be a far better design here. The use
> of rd_projidx as a set of indexes into the relation's indexlist is
> inefficient and overcomplicated, plus it creates an unnecessary and not
> very safe (even if it were documented) coupling between rd_indexlist and
> rd_projidx.
>
> * Having ProjIndexIsUnchanged examine relation->rd_projidx directly is
> broken by design anyway, both from a modularity standpoint and because
> its inner loop involves catalog accesses that could result in relcache
> flushes. I'm somewhat amazed that the regression tests passed on
> CLOBBER_CACHE_ALWAYS buildfarm animals, although possibly this is
> explained by the fact that they're only testing cases with a single
> expression index, so that the bitmap isn't checked again after the cache
> flush happens. Again, this could be fixed if what was returned was just
> a list of relevant index OIDs.
>
> * This bit of coding is unsafe, for the reason explained in the existing
> comment:
>
> /*
> * Now save copies of the bitmaps in the relcache entry. We intentionally
> * set rd_indexattr last, because that's the one that signals validity of
> * the values; if we run out of memory before making that copy, we won't
> * leave the relcache entry looking like the other ones are valid but
> * empty.
> */
> oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> relation->rd_keyattr = bms_copy(uindexattrs);
> relation->rd_pkattr = bms_copy(pkindexattrs);
> relation->rd_idattr = bms_copy(idindexattrs);
> relation->rd_indexattr = bms_copy(indexattrs);
> + relation->rd_projindexattr = bms_copy(projindexattrs);
> + relation->rd_projidx = bms_copy(projindexes);
> MemoryContextSwitchTo(oldcxt);
>
> * There's a lot of other inattention to comments. For example, I noticed
> that this patch added new responsibilities to RelationGetIndexList without
> updating its header comment to mention them.
>
> * There's a lack of attention to terminology, too. I do not think that
> "projection index" is an appropriate term at all, nor do I think that
> "recheck_on_update" is a particularly helpful option name, though we
> may be stuck with the latter at this point.
>
> * I also got annoyed by minor sloppiness like not adding the new
> regression test in the same place in serial_schedule and
> parallel_schedule. The whole thing needed more careful review
> than it got.
>
> In short, it seems likely to me that large parts of this patch need to
> be pulled out, rewritten, and then put back in different places than
> they are today. I'm not sure if a complete revert is the best next
> step, or if we can make progress without that.

We've not really made progress on this. I continue to think that we
ought to revert this feature, and then work to re-merge it an
architecturally correct way afterwards. Other opinions?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-01-14 22:37:23 Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives
Previous Message Alvaro Herrera 2019-01-14 22:31:07 Re: unique, partitioned index fails to distinguish index key from INCLUDEd columns