Re: Suggestions for implementing IS DISTINCT FROM?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Lockhart <lockhart(at)fourpalms(dot)org>
Cc: PostgreSQL Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suggestions for implementing IS DISTINCT FROM?
Date: 2002-06-25 16:36:04
Message-ID: 8698.1025022964@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Lockhart <lockhart(at)fourpalms(dot)org> writes:
> Let's talk about the preferred technique for doing so, especially with
> row-style argument lists. I want to implement IS NULL for rows also
> (actually, already did so using a transformation in gram.y -- no need to
> jump on that one Tom ;), and found a comment from Joe on the NullTest
> code saying that he wanted to do just that "someday". Should we honk
> around the existing NullTest node to handle both lists of arguments and
> single arguments, or should we have a completely different set of nodes
> for handling row arguments? I'll guess the latter, but we should talk
> whether that scales properly, about the preferred style for this new
> kind of node, and about how to minimize performance hits which we might
> see if we have a large number of new node types being handled by the
> executor.

I have a mild preference for minimizing the number of expression node
types. If we can represent two similar kinds of expressions with one
node type instead of two, we roughly halve the amount of overhead code
needed.

My inclination would be to divvy things up on the basis of the kinds
of things to be compared, so we'd have these classes of nodes:

1. Test a single <row value expression>; this would cover

<null predicate> ::=
<row value expression> IS [ NOT ] NULL

2. Compare two <row value expressions>; this would cover

<comparison predicate> ::=
<row value expression> <comp op> <row value expression>

<distinct predicate> ::=
<row value expression> IS DISTINCT FROM <row value expression>

(OVERLAPS is a special case that should remain separate, IMHO)

3. Compare three <row value expressions>:

<between predicate> ::=
<row value expression> [ NOT ] BETWEEN
[ ASYMMETRIC | SYMMETRIC ]
<row value expression> AND <row value expression>

4. Compare a <row value expression> to a list of <row value expressions>:

<in predicate> ::=
<row value expression>
[ NOT ] IN <left paren> <in value list> <right paren>

<in value list> ::=
<row value expression> { <comma> <row value expression> }...

5. Compare a <row value expression> to the outputs of a <subquery>:

<in predicate> ::=
<row value expression>
[ NOT ] IN <table subquery>

<quantified comparison predicate> ::=
<row value expression> <comp op> <quantifier>
<table subquery>

<match predicate> ::=
<row value expression> MATCH [ UNIQUE ]
[ SIMPLE | PARTIAL | FULL ]
<table subquery>

Case 5 corresponds exactly to the existing SubLink node type (although
it's missing the MATCH options at the moment, and doesn't implement all
the <comp op> cases it should).

The spec intends all of these constructs to include the case where the
<row value expression> is a single scalar expression. I'm feeling
ambivalent about whether we should have the same node type or different
ones for the single-value and row-value cases. For SubLink we currently
handle single-value and row-value left hand sides with the same code,
and that seems fine. For <comparison predicate> I definitely *don't*
want to burden scalar comparisons with row-value overhead, so we need
two separate representations for that. The other cases seem to be on the
borderline. We don't currently have scalar-case node types for these,
except for NullTest, so new node-type coding is needed anyway --- it
could be either a separate scalar node type or merged with the row-value
case. No strong preference here, but slight leaning towards merging.

> Using the SubLink node does not seem quite right because IS DISTINCT
> FROM does not seem to make sense with an embedded select as one of the
> arguments, but maybe it does??

I agree it doesn't make sense. However, we should look at SubLink and
see if we can't clean it up and maybe share some code. For all of the
cases that compare rows, you need to have a list of the appropriate
scalar comparison operators to use for each column. The way SubLink
does that is pretty ugly (especially its use of a modifiable Const node).
Let's see if we can't improve that before we copy it ;-). I'm thinking
that the expression tree itself ought to contain just a list of Oper
nodes dangling from the SubLink or row comparison node. We'd need an
alternate entry point similar to ExecEvalOper/ExecMakeFunctionResult
that would accept two Datum values rather than a list of subexpressions
to evaluate, but that seems very doable.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-06-25 16:41:15 Re: Democracy and organisation : let's make a revolution in
Previous Message Bruce Momjian 2002-06-25 16:33:07 Re: Democracy and organisation : let's make a