Re: ALTER TYPE 3: add facility to identify further no-work cases

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 14:35:24
Message-ID: AANLkTikuqc1V+pRfs-AN_cp6O0XZjEt4gkchc9AAaNCV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 11:13 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > This helps on conversions like varchar(4)->varchar(8) and text->xml.
>>
>> I've read through this patch somewhat.  As I believe Tom also
>> commented previously, exemptor is a pretty bad choice of name.
>
> I spent awhile with a thesaurus before coining that.  Since Tom's comment, I've
> been hoping the next person to complain would at least suggest a better name.
> Shooting in the dark on a nomenclature issue isn't fun, and I'm not picky.

I can't speak for Tom, but I guess my gripe about that name is that we
seem to view a cast as either WITH FUNCTION or WITHOUT FUNCTION. A
cast-without-function is trivial from an implementation point of view,
but it's still a cast, whereas the word "exempt" seems to imply that
you're skipping the cast altogether. A side issue is that I really
want to avoid adding a new parser keyword for this. It doesn't bother
me too much to add keywords for really important and user-visible
features, but when we're adding stuff that's only going to be used by
0.1% of our users it's really better if we can avoid it, because it
slows down the parser. Maybe we could do something like:

CREATE CAST (source_type AS target_type)
WITH FUNCTION function_name (argument_type, [, ...])
[ ANALYZE USING function_name ]
[ AS ASSIGNMENT | AS IMPLICIT ]

Presumably, we wouldn't need the ANALYZE USING clause for the WITHOUT
FUNCTION case, since the answer is a foregone conclusion in that case.
We could possibly support it also for WITH INOUT, but I'm not sure
whether the real-world utility is more than zero.

Internally, we could refer to a function of this type as a "cast analyzer".

Don't take the above as Gospel truth, it's just an idea from one
person on one day.

>> More
>> generally, I think this patch is a case of coding something
>> complicated without first achieving consensus on what the behavior
>> should be.  I think we need to address that problem first, and then
>> decide what code needs to be written, and then write the code.
>
> Well, that's why I started the design thread "Avoiding rewrite in ALTER TABLE
> ALTER TYPE" before writing any code.  That thread petered out rather than reach
> any consensus.  What would you have done then?

Let's defer this question until we're more on the same page about the
rest of this. It's obvious I'm not completely understanding this
patch...

>> I think what this patch is trying to do is tag the
>> call to the cast function with the information that we might not need
>> to call it, but ISTM it would be better to just notice that the
>> function call is unnecessary and insert a RelabelType node instead.
>
> This patch does exactly that for varchar(4) -> varchar(8) and similar cases.

Oh, uh, err... OK, I obviously haven't understood it well enough.
I'll look at it some more, but if there's anything you can write up to
explain what you changed and why, that would be helpful. I think I'm
also having trouble with the fact that these patches don't apply
cleanly against the master branch, because they're stacked up on each
other. Maybe this will be easier once we get more of the ALTER TYPE 2
stuff in.

>
>> *reads archives*
>>
>> In fact, Tom already made pretty much exactly this suggestion, on a
>> thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE".  The
>> only possible objection I can see to that line of attack is that it's
>> not clear what the API should be, or whether there might be too much
>> runtime overhead for the amount of benefit that can be obtained.
>
> I believe the patch does implement Tom's suggestion, at least in spirit.  He
> mentioned expression_planner() as the place to do it.  In principle, We could do
> it *right there* and avoid any changes to coerce_to_target_type(), but the
> overhead would increase.  Specifically, we would be walking an already-built
> expression tree looking for casts to remove, probably doing a syscache lookup
> for every cast to grab the exemptor function OID.  Doing it there would prevent
> us from directly helping or harming plain old queries.  Since ExecRelCheck(),
> FormIndexDatum() and other notable paths call expression_planner(), we wouldn't
> want to casually add an expensive algorithm there.  To avoid the extra syscache
> lookups, we'd piggyback off those already done in coerce_to_target_type().  That
> brings us to the current implementation.  Better to make the entire decision in
> coerce_to_target_type() and never add the superfluous node to the expression
> tree in the first place.
>
> coerce_to_target_type() and callees seemed like the natural, low cost place to
> make the changes.  By doing it that way, did I miss some important detail or
> implication of Tom's suggestion?

I'm not sure. Tom?

> As for performance, coerce_to_target_type() is certainly in wide use, but it's
> not exactly a hot path.  I prepared and ran the attached bench-coerce-worst.sql
> and bench-coerce-best.sql.  Both are quite artificial.  The first one basically
> asks "Can we measure the new overhead at all?"  The second one asks "Would any
> ordinary query benefit from removing a superfluous cast from an expression
> tree?"  The worst case had an 4% _speedup_, and the best case had a 27% speedup,
> suggesting answers of "no" and "yes (but not dramatically)".  The "worst-case"
> speedup doesn't make sense, so it's probably an artifact of some recent change
> on master creating a larger performance dip in this pathological test case.  I
> could rebase the patch and rerun the benchmark, but I doubt an exact figure
> would be much more meaningful.  Unfortunately, I can't think of any more-natural
> tests (help welcome) that would still illustrate any performance difference.

That certainly seems promising, but I don't understand how the new
code can be faster on your constructed worst case. Thoughts?

>> I think it's a mistake to merge the handling of the rewrite-vs-noop
>> case with the check-vs-noop case.  They're fundamentally dissimilar.
>> As noted above, being able to notice the noop case seems like it could
>> save run-time work during ordinary query execution.  But detecting the
>> "check" case is useless work except in the specific case of a table
>> rewrite.  If we're going to do that at all, it seems to me that it
>> needs to be done via a code-path that's only going to get invoked in
>> the case of ALTER TABLE, not something that's going to happen every
>> time we parse an expression tree.
>
> I disagree that they're fundamentally dissimilar.  They're _fundamentally_
> similar, in that they are both fences, standing at different distances, around
> the necessity of a cast function.  Your argument only mentions performance, and
> it boils down to a suggestion that we proceed to optimize things now by
> separating these questions.  Perhaps we should, but that's a different question.
> Based on my benchmarks, I'm not seeing a need to micro-optimize.

I'm more concerned about modularity than performance. Sticking a
field into every FuncExpr so that if it happens to get passed to an
ALTER TABLE statement we can pull the flag out seems really ugly to
me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Felix Schmidt @ Oracle 2011-01-26 14:46:49 Query Optimizer + Parallel Operators
Previous Message Simon Riggs 2011-01-26 13:40:28 Re: SSI, simplified