Re: [PATCH] Avoid collation lookup for "char" statistics

From: Feng Wu <wufengwufengwufeng(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Avoid collation lookup for "char" statistics
Date: 2026-06-28 13:59:49
Message-ID: CACK3muq-pzJsWYV_N3OuP3tiJVHMob+dAs-NbFi7KCLUWVPLLA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

Thanks for looking at it. I should have included the exact reproducer in
the first email.

The failure needs histogram statistics on a "char" column, so simple
catalog joins such as the pg_type example do not necessarily reach the
problematic path. For example, I just rechecked that your query plans as
a hash join for me:

explain select * from pg_type a join pg_type b using (typtype);

To reach the failing path, this reproduces it for me on current master:

create temp table char_stats_1 (c "char");
create temp table char_stats_2 (c "char");

insert into char_stats_1
select v::"char"
from unnest(array['I','S','c','i','m','p','r','t','v']) as v,
generate_series(1,
case when v in ('i','v','r','t') then 50 else 1 end);

insert into char_stats_2
select v::"char"
from unnest(array['a','e','i']) as v,
generate_series(1, case when v = 'i' then 50 else 5 end);

analyze char_stats_1;
analyze char_stats_2;

set enable_hashjoin = off;
set enable_nestloop = off;

explain (costs off)
select count(*)
from char_stats_1 s1
join char_stats_2 s2 on s1.c = s2.c;

After ANALYZE, the first column has histogram statistics, e.g.:

tablename | attname | n_distinct | most_common_vals | histogram_bounds
-------------+---------+------------+------------------+------------------
char_stats_1 | c | 9 | {i,r,t,v} | {I,S,c,m,p}
char_stats_2 | c | 3 | {i,a,e} |

Without the patch, planning fails with:

ERROR: XX000: cache lookup failed for collation 0
LOCATION: pg_newlocale_from_collation, pg_locale.c:1211

The reason is that mergejoinscansel() calls scalarineqsel() while costing
the merge join. That can reach convert_to_scalar() for histogram bounds.
For CHAROID, convert_string_datum() builds a one-byte string, but then
still calls pg_newlocale_from_collation() with the clause input collation,
which is InvalidOid for the non-collatable "char" type.

The patch only skips that collation lookup for CHAROID. For text,
varchar, bpchar, and name, the existing locale handling is unchanged.

Regards,
Feng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-06-28 15:39:00 Re: [PATCH] Avoid collation lookup for "char" statistics
Previous Message Jeevan Chalke 2026-06-28 12:56:02 Re: ON EMPTY clause for aggregate and window functions