Re: PostgreSQL12 crash bug report

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: yi huang <yi(dot)codeplayer(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL12 crash bug report
Date: 2019-08-27 17:06:37
Message-ID: 17374.1566925597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
>> One thing I'm struggling with is understanding and testing nested EPQ
>> states:
>> * Each EState must have its own es_epqScanDone state, but if we have
>> * nested EPQ checks they should share es_epqTupleSlot arrays. This
>> * allows sub-rechecks to inherit the values being examined by an outer
>> * recheck.
>> At the moment I don't understand what the point here is. When/why do we
>> want to inherit values from the parent epqstate? I know how to construct
>> queries with nested EPQ, but I don't see why we'd ever want to inherit
>> the values from the outer check? Nor how we're doing so -
>> EvalPlanQualFetchRowMarks() unconditionally clears out the old tuple,
>> and nodeLockRows.c unconditionally fetches the source tuples too?

It's been a really long time, but I think that the idea might have been
to avoid useless repeat tuple fetches in nested-EPQ cases. However,
those are such extreme corner cases that I agree it's not worth contorting
the logic to make them a shade faster (if indeed it works at all at
present, which it might not).

> This also fixes Tom's complain elswhere that nodeLockRows.c does
> EvalPlanQualBegin() earlier, by allowing EvalPlanQualSlot() to be called
> before that. That seems considerably nicer to me.

Yay.

> Tom, everyone, what do you think of the attached approach? It
> implements what I described upthread.

Seems reasonable to store a pointer in the EPQState rather than pass
that as a separate argument, but I think you need a bit more documentation
work to keep the two EState pointers from being impossibly confusing.
It might help to write something like "estate is an executor state node
instantiated for the same rangetable as parentestate (though it might
be running only a sub-tree of the main plan). While parentestate
represents the "real" plan execution, estate runs EPQ rechecks in
which the table scan nodes are changed to return only the current tuple
of interest for each table. Those current tuples are found using the
rowmark mechanisms." Maybe we should change the "estate" field name
to something else, too --- "childestate" is the first thing that comes
to mind but it's rather generic. "recheckestate", perhaps?

Also I dislike the field name "es_active_epq", as what that suggests
to me is exactly backwards from the way you apparently are using it.
Maybe "es_parent_epq" instead? The comment for it is pretty opaque
as well, not to mention that it misspells EState.

I don't agree with the FIXME comment in execnodes.h. The rowmarks
state is used for things that are outside of EPQ, particularly the
actual row locking ;-), so I wouldn't want to move that even if it
didn't create notational issues.

I'm confused by the removal of the EvalPlanQualBegin call at
nodeModifyTable.c:1373 ... why is that OK?

The original intent of EvalPlanQualInit was to do an absolutely
minimal amount of work, since we don't know yet whether EPQ will
ever be needed (and, generally, the way to bet is that it won't be).
You've added some pallocs and list-to-array conversion, which
while not terribly expensive still seem like they don't belong
there. Can't we move most of that to EvalPlanQualBegin? I see that
we want the substitute_slot array to exist so that EvalPlanQualSlot
can work without EvalPlanQualBegin, but if we're postponing
non-locking-rowmark work then we don't need the rest.

You seem to have removed EvalPlanQualFetchRowMarks, but the
extern for it is still there?

The bool result of EvalPlanQualFetchRowMark needs documented.

In ExecIndexOnlyMarkPos and ExecIndexOnlyRestrPos, you seem to have
incorrectly changed the function names given in comments and elog messages
(copy&pasto?)

> Questions:
> - Is there any potential downside to not fetching non-locking rowmarks
> eagerly? It's observable for FDWs, but it shouldn't matter? As far as
> I can tell the new behaviour should always be at least as efficient as
> before?

It's possible that in some cases we'd not re-fetch a tuple that the
existing code would re-fetch, but I have a hard time seeing how that's
bad.

> - Should the RTI indexed version of arowmarks be computed as an array
> directly in nodeModifyTable.c etc? Right now we build it first as a
> list, and then convert it as an array? We could also just use an rti
> indexed List, and store NULLs where there are no RTIs? But there's no
> good API for that.

Do that as a follow-on patch if at all. I'm not sure it's worth changing.

> - We currently - and haven't in recent history - free epq test tuples
> until EPQ is needed the next time. Perhaps we should?

Can't get excited about it.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-08-27 19:57:55 Re: BUG #15884: json_object_agg errors on null in field name
Previous Message Tom Lane 2019-08-27 15:33:21 Re: BUG #15981: Alter table add column if not exists with constraint fails on constraint