Re: Modest proposal for making bpchar less inconsistent

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bruce(at)momjian(dot)us, pavel(dot)stehule(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Modest proposal for making bpchar less inconsistent
Date: 2019-10-01 22:09:47
Message-ID: 13689.1569967787@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian <bruce(at)momjian(dot)us> wrote in <20190928122222(dot)GA26853(at)momjian(dot)us>
>> On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote:
>>> Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:
>>>> It struck me that the real reason that we keep getting gripes about
>>>> the weird behavior of CHAR(n) is that these functions (and, hence,
>>>> their corresponding operators) fail to obey the "trailing blanks
>>>> aren't significant" rule:
>>>> ...
>>>> We could fix this, and save some catalog space too, if we simply
>>>> deleted these functions/operators and let such calls devolve
>>>> into implicit casts to text.

>> Yes, I think this is a great idea!

> I totally agree.

I experimented with this, as per the attached simple patch. If you just
apply the catalog deletions, no regression test results change, which
says more about our lack of test coverage in this area than anything else.
So I added a few simple test cases too.

However, playing with this more, I'm not sure it's the direction we want
to go. I realized that the BPCHAR-related code paths in like_support.c
are dead code with this patch, because it's no longer possible to match
a LIKE/regex operator to a bpchar column. For example, in existing
releases you can do

regression=# create table t(f1 char(20) unique);
CREATE TABLE
regression=# explain select * from t where f1 like 'abcdef';
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using t_f1_key on t (cost=0.15..8.17 rows=1 width=24)
Index Cond: (f1 = 'abcdef'::bpchar)
Filter: (f1 ~~ 'abcdef'::text)
(3 rows)

regression=# explain select * from t where f1 like 'abcdef%';
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=4.23..14.39 rows=8 width=24)
Filter: (f1 ~~ 'abcdef%'::text)
-> Bitmap Index Scan on t_f1_key (cost=0.00..4.23 rows=8 width=0)
Index Cond: ((f1 >= 'abcdef'::bpchar) AND (f1 < 'abcdeg'::bpchar))
(4 rows)

But with this patch, you just get dumb seqscan plans because the
expression trees now look like "f1::text ~~ constant" which doesn't
match to an index on the bare column f1.

If we wanted to preserve these index optimizations while still
redefining the pattern match operators as ignoring trailing whitespace,
we could keep the operators/functions but change them to point at new
C functions that strip trailing blanks before invoking the pattern
match machinery. Some thought would need to be given as well to whether
like_fixed_prefix et al need to behave differently to agree with this
behavior. (Offhand it seems like they might need to strip trailing
blanks from what would otherwise be the fixed prefix, but I'm not
quite sure.)

That would be much more work than this patch of course (though still
not an enormous amount), and I'm not quite sure if it's worth the
trouble. Is this a case that anyone is using in practice?

regards, tom lane

Attachment Content-Type Size
remove-bpchar-pattern-match-ops-1.patch text/x-diff 10.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-10-01 22:57:30 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Thomas Munro 2019-10-01 20:36:56 Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays