|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>|
|Cc:||Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, David Steele <david(at)pgmasters(dot)net>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: [HACKERS] [PATCH] Generic type subscripting|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I tested the last patch on my FC33 Lenovo T520 (I7) and I don't see 15%
> slowdown too .. On my comp there is a slowdown of about 1.5-3%. I used your
> function arraytest.
After repeating the experiment a few times, I think I was misled
by ASLR variance (ie, hot code falling within or across cache
lines depending on where the backend executable gets loaded).
I'd tried a couple of postmaster restarts, but seemingly not
enough to expose the full variance in runtime that's due to that.
I do still see a 2% or so penalty when comparing best-case runtimes,
which is consistent with other people's reports.
However, 2% is still more than I want to pay for this feature,
and after studying the patchset awhile I don't think it's tried
hard at all on execution efficiency. We should eliminate the
ExecEvalSubscriptingRef* interface layer altogether, and have
ExecInterpExpr dispatch directly to container-type-specific code,
so that we end up with approximately the same call depth as before.
With the patches shown below, we are (as best I can tell) right
about on par with the existing code's runtime. This patch also gets
rid of a few more undesirable assumptions in the core executor ---
for instance, AFAICS there is no need for *any* hard-wired limit
on the number of subscripts within the core executor. (What a
particular container type chooses to support is its business,
of course.) We also need not back off on the specificity of error
messages, since the complaints that were in ExecEvalSubscriptingRef*
are now in container-type-specific code.
There were other things not to like about the way v35 chose to
refactor the executor support. In particular, I don't think it
understood the point of having the EEOP_SBSREF_SUBSCRIPT steps,
which is to only transform the subscript Datums to internal form
once, even when we have to use them twice in OLD and ASSIGN steps.
Admittedly DatumGetInt32 is pretty cheap, but this cannot be said
of reading text datums as the 0003 patch wishes to do. (BTW, 0003 is
seriously buggy in that regard, as it's failing to cope with toasted
or even short-header inputs. We really don't want to detoast twice,
so that has to be dealt with in the SUBSCRIPT step.) I also felt that
processing the subscripts one-at-a-time wasn't necessarily a great
solution, as one can imagine container semantics where they need to be
handled more holistically. So I replaced EEOP_SBSREF_SUBSCRIPT with
EEOP_SBSREF_SUBSCRIPTS, which is executed just once after all the
subscript Datums have been collected. (This does mean that we lose
the optimization of short-circuiting as soon as we've found a NULL
subscript, but I'm not troubled by that. I note in particular that
the core code shouldn't be forcing a particular view of what to do
with null subscripts onto all container types.)
The two patches attached cover the same territory as v35's 0001 and
0002, but I split it up differently because I didn't see much point
in a division that has a nonfunctional code state in the middle.
0001 below is just concerned with revising things enough so that the
core executor doesn't have any assumption about a maximum number of
subscripts. Then 0002 incorporates what was in v35 0001+0002, revised
with what seems to me a better set of execution APIs.
There are a bunch of loose ends yet, the first three introduced
by me and the rest being pre-existing problems:
* I don't have a lot of confidence in the LLVM changes --- they seem
to work, but I don't really understand that code, and in particular
I don't understand the difference between TypeParamBool and
TypeStorageBool. So there might be something subtly wrong with the
code generation for EEOP_SBSREF_SUBSCRIPTS.
* As things stand here, there's no difference among the expression
step types EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN, and EEOP_SBSREF_FETCH;
they dispatch to different support routines but the core executor's
behavior is identical. So we could fold them all into one step type,
and lose nothing except perhaps debugging visibility. Should we do
that, or keep them separate?
* I've not rebased v35-0003 and later onto this design, and don't
intend to do so myself.
* The patchset adds a CREATE TYPE option, but fails to provide any
pg_dump support for that option. (There's no test coverage either.
Maybe further on, we should extend hstore or another contrib type
to have subscripting support, if only to have testing of that?)
* CREATE TYPE fails to create a dependency from a type to its
subscripting function. (Related to which, the changes to the
GenerateTypeDependencies call in TypeShellMake are surely wrong.)
* findTypeSubscriptingFunction contains dead code (not to mention sadly
* What is refnestedfunc? That sure seems to be dead code.
* I'm not on board with including refindexprslice in the transformed
expression, either. AFAICS that is the untransformed subscript list,
which has *no* business being included in the finished parsetree.
Probably that needs to be passed to the type-specific
transform/validate code separately.
* I've not really reviewed the parse analysis changes, but what is
the motivation for separating the prepare and validate callbacks?
It looks like those could be merged.
* exprType (and exprTypeMod, perhaps) seem to be assuming more than
they should about subscripting semantics. I think it should be
possible for the type-specific code to define what the result type
of a subscripting transformation is, without hard-wired rules like
* The new code added to arrayfuncs.c seems like it doesn't really
belong there (the fact that it forces adding a ton of new #include's
is a good sign that it doesn't fit with the existing code there).
I'm inclined to propose that we should break that out into a new .c
file, maybe "arraysubs.c".
* The proposed documentation in 0004 is pretty poor. You might
as well drop all of xsubscripting.sgml and just say "look at
the existing code for examples". (Splitting the array interface
code out into a new file would help with that, too, as there'd be
a well-defined set of code to point to.)
regards, tom lane
|Next Message||Laurenz Albe||2020-12-04 15:55:52||Re: Add session statistics to pg_stat_database|
|Previous Message||David Rowley||2020-12-04 14:41:21||Re: Hybrid Hash/Nested Loop joins and caching results from subplans|