Re: Collatability of type "name"

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Collatability of type "name"
Date: 2018-12-09 18:26:14
Message-ID: CAFj8pRBtjdW2fYkqaji_pX0euJDHkiA_LSOpHtVsW5FXH8oaVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

ne 9. 12. 2018 v 18:50 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:

> I've been experimenting with the task proposed in [1] of expanding
> the text_ops operator family to include type "name" as well as
> cross-type text vs. name operators. These operators would need to
> offer collation-aware sorting, since that's exactly the difference
> between text_ops and the non-collation-aware name_ops opfamily.
> I ran into a nasty stumbling block almost immediately: the proposed
> name vs. name comparison operators fail, because the parser sees
> that both inputs are of noncollatable types so it doesn't assign
> any collation to the operator node.
>
> I experimented with leaving out the name vs. name operators and
> just adding cross-type text vs. name and name vs. text operators.
> That turns out not to work well at all. Aside from the fact that
> opr_sanity whines about an incomplete operator family, I found
> various situations where the planner fails, complaining about
> things like "missing operator 1(19,19) in opfamily 1994". The
> root of that mess seems to be that we've supposed that if an
> equality operator is marked mergejoinable then it is mergejoinable
> in every opfamily that it's a member of. But that isn't true in
> an opfamily structure like this. For instance "text = name" should
> be mergejoinable in the name_ops opclass, since we know how to sort
> both text and name in non-collation-aware ways. But it's not
> mergejoinable in the text_ops opclass if text_ops doesn't provide
> collation-aware name vs. name operators to sort the name input with.
>
> We could probably fix that, at the cost of about tripling the work
> needed to detect whether an operator is really mergejoinable, but
> I have little confidence that there aren't more problems lurking
> behind it. There are a lot of aspects of EquivalenceClass processing
> that look pretty questionable if we're trying to support operators
> that act this way. For instance, if we derive "a = c" given "a = b"
> and "b = c", the equality operator in "a = c" might be mergejoinable
> in a different set of opclasses than the other two operators are,
> making it debatable whether it can be thought to belong to the same
> EquivalenceClass at all.
>
> So the other approach I'm contemplating is to mark type name as
> collatable (with "C" as its typcollation, probably). There are
> two plausible sub-approaches:
>
> 1. The regular name comparison operators remain non-collation-aware.
> This would be the least invasive way but it'd have the odd side-effect
> that expressions like "namecoll1 < namecoll2 COLLATE something"
> would be accepted but the collation would be ignored. Also, we'd
> have to invent some new names for the collation-aware name-vs-name
> operators, and I don't see any obvious candidate for that.
>
> 2. Upgrade the name comparison operators to be collation-aware,
> with (probably) all the same optimizations for C collation as we
> have for text. This'd be a cleaner end result but it seems like
> there are a lot of potential side-effects, e.g. syscache lookups
> would have to be prepared to pass the right collation argument
> to name comparisons.
>
> I feel like #2 is probably really the Right Thing, but it's also
> sounding like significantly more work than I thought this was going
> to involve. Not sure if it's worth the effort right now.
>
> Also, I think that either solution would lead to some subtle changes
> in semantics. For example, right now if you compare a name column
> to a text value, you get a text (collation-aware) comparison using
> the database's default collation. It looks like if name columns
> are marked with attcollation = 'C', that would win and the comparison
> would now have 'C' collation unless you explicitly override it with
> a COLLATE clause. I'm not sure this is a bad thing --- it'd be more
> likely to match the sort order of the index on the column --- but it
> could surprise people.
>

The sort of table's names is not too common operation. I don't see a C
collate for names as any risk.

Regards

Pavel

>
> Thoughts?
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/5978.1544030694@sss.pgh.pa.us
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-09 18:36:23 Re: automatically assigning catalog toast oids
Previous Message Tom Lane 2018-12-09 18:05:43 Re: pg_partition_tree crashes for a non-defined relation