Re: implement CAST(expr AS type FORMAT 'template')

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: implement CAST(expr AS type FORMAT 'template')
Date: 2026-06-29 05:19:53
Message-ID: CABXr29FyPC7terFF7E+r462BEHhYgv06oUVoBrhkH7xhshuE6A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 18, 2026 at 7:52 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Mar 31, 2026 at 1:48 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
> > Everything's passing, moving the tests out of citext, etc into cast.sql is good.
> >
> > I think the next step it to bring each TODO and FIXME into the thread, and explain the factors that prevent you from being certain about what to do in those situations.
>
> IMHO, this patch as currently written is basically dead on arrival.
> The v7 patch still works by constructing calls to
> pg_catalog.{to_date,to_number,to_timestamp,to_char} based on the input
> data type. But I think it's been clearly stated on this thread that we
> need some kind of more generic infrastructure. Vik said it best: "we
> need to find a way to make this generic so that custom types can
> define formatting rules for themselves." I completely agree. The
> extensible type system in PostgreSQL is one of the project's greatest
> triumphs, and I do not think anyone is going to be enthusiastic about
> committing a patch that purports to implement a flavor of casting but
> is completely unextensible by out-of-core types and doesn't even cover
> all the in-core types for which it might be interesting. Even if
> someone is, -1 from me.
>
> I suggest backing up to David G. Johnston's comment here: "How about
> changing the specification for create type. Right now input functions
> must declare either 1 or 3 arguments. Let’s also allow for 2 and
> 4-argument functions where the 2nd or 4th is where the format is
> passed. If a data type input function lacks one of those signatures
> it is a runtime error if a format clause is attached to its cast
> expression. For output, we go from having zero input arguments to
> zero or one, with the same resolution behavior." I'm not sure that
> David's proposal here is really the best thing, but it's the kind of
> thing that *could* be right, i.e. a generic infrastructure that can
> work for any choice of data type.
>
> The reason I somewhat hesitate to endorse that specific proposal is
> that I'm not convinced that we should actually treat this as a form of
> casting. Casts can be set to IMPLICIT or ASSIGNMENT or EXPLICIT, and
> they can be WITHOUT FUNCTION or WITH INOUT, and none of that can be
> relevant here. A CAST with FORMAT always needs to be implemented by a
> function, is always explicit from a syntax point of view, and the code
> to implement probably looks pretty different from the code needed for
> a non-FORMAT cast. I am somewhat inclined to think we want something
> that's like a cast function but actually a wholly separate mechanism,
> e.g. a new pg_formatter catalog. I note that Jian He proposed putting
> something in pg_type but I don't see how that can work, since there
> are two types involved.
>
> I don't accept the argument that we should start with this and extend
> it later. The patch as proposed is just syntactic sugar. Said another
> way, there is existing syntax that already delivers the functionality.
> So I don't see why we would rush out support for a bit of new syntax;
> anyone who wants to use this functionality can already do so. Getting
> the infrastructure right is, IMHO, the interesting part of the
> project, and I think that work needs to be done first.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
>

Robert pointed out earlier that hard-coding specific functions like
to_date and to_char in the parser would make the feature feel like a
set of special cases rather than a general mechanism. I agree, and this
version moves to a catalog-driven design.

I added a new pg_formatter catalog. A row in this catalog represents
one formatted conversion from a source type to a target type, and points
to a formatter function with this signature:

formatter(source_type, text) returns target_type

The second argument is the FORMAT expression coerced to text.

I kept this separate from pg_cast because formatted casts have different
semantics from ordinary casts. Ordinary casts have concepts like implicit
casts, assignment casts, binary-compatible casts, and I/O casts. A
formatted cast is different: it is always explicit, it has an extra format
expression, and it needs a function that receives both the source value and
the format string. Putting this into pg_cast seemed likely to add special
cases there, while a separate catalog keeps the model simpler.

Another important point in this version is that built-in and user-defined
formatters use the same mechanism. The parser and analyzer do not know
about to_date, to_char, or to_timestamp. Built-in datetime/string
formatters are registered as catalog rows, and user-defined formatters are
registered the same way. If no formatter exists for the exact source and
target type pair, the formatted cast fails. It does not fall back to
ordinary cast resolution.

I split the work into four patches.

Patch 1 only adds parser support for the syntax:

CAST(expr AS type FORMAT format_expr)

It extends the raw TypeCast node with an optional format field and
updates the grammar and raw expression walker. Parse analysis still
rejects formatted casts in this patch, so this is only the syntax and raw
parse-tree representation.

Patch 2 adds the catalog and DDL infrastructure. It introduces
pg_formatter, syscache support, CREATE FORMATTER, DROP FORMATTER,
dependency handling, object-address support, and pg_dump support. At this
point CAST ... FORMAT still does not execute; the patch only adds the
catalog object model.

Patch 3 is where formatted casts are resolved. During parse analysis, the
source expression is transformed, the FORMAT expression is transformed and
coerced to text, and pg_formatter is searched by exact source and target
type. I also treat unknown source literals as text, so cases like this
resolve naturally:

CAST('2026-06-28' AS date FORMAT 'YYYY-MM-DD')

I added a CoerceViaFormatter node rather than using a plain FuncExpr.
This preserves CAST(... FORMAT ...) syntax in ruleutils and pg_dump, and
allows the dependency walker to record a dependency on the pg_formatter
row itself. Execution still uses the normal function-call machinery, so no
new executor opcode is needed.

Patch 4 adds an initial set of built-in formatter functions and bootstrap
pg_formatter rows for common datetime/string cases:

text/varchar/bpchar <-> date
text/varchar/bpchar <-> timestamp
text/varchar/bpchar <-> timestamptz

These built-ins are reached through the same catalog lookup path as
user-defined formatters. They are not special-cased in parse analysis. The
wrappers reuse PostgreSQL’s existing formatting code and are marked STRICT,
so NULL input or NULL FORMAT returns NULL.

This gives us SQL-standard-style CAST ... FORMAT support for the common
datetime/string cases, but the series intentionally does not claim complete
SQL-standard datetime-template conformance. The built-ins reuse
PostgreSQL’s existing formatting template behavior. Additional types such
as time, timetz, numeric, and interval, or stricter standard-template
coverage, can be added later as formatter functions without changing the
parser or the CoerceViaFormatter representation.

The main goal of this redesign is to make CAST ... FORMAT a real
catalog-driven facility rather than a parser rewrite to a few built-in
functions. I think this addresses Robert’s concern better and gives us a
path for both built-in and extension-defined formatters.

I have attached the full patch series for review. Comments are welcome.

Regards,
Haibo

Attachment Content-Type Size
0004-Add-built-in-formatters-for-CAST-FORMAT.patch application/octet-stream 45.2 KB
0003-Use-CoerceViaFormatter-for-CAST-FORMAT.patch application/octet-stream 44.4 KB
0001-Add-parser-support-for-CAST-FORMAT-syntax.patch application/octet-stream 5.6 KB
0002-Add-pg_formatter-catalog-and-CREATE-DROP-FORMATTER.patch application/octet-stream 69.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-06-29 05:47:58 Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
Previous Message Henson Choi 2026-06-29 05:05:39 Re: Add wait events for server logging destination writes