Re: Datum as struct

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Datum as struct
Date: 2025-08-08 20:30:58
Message-ID: rhtj4je7eaad7c7cjcliae7gh5lpq4iv6bvysltkhvohnwzdcn@degiul54awrv
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-07-31 16:02:35 +0200, Peter Eisentraut wrote:
> Another draft patch set that I had lying around that was mentioned in [0].

Nice, thanks for doing that. I tried this a few years back and was scared away
by the churn, but I think it's well worth tackling.

One thing that would be an interesting follow-up would be to make the struct
not just carry the datum, but also the type of the field, to be filled in by
tuple deforming and the *GetDatum() functions. Then we could assert that the
correct DatumGet*() functions are used. I think that'd allow us to detect a
rather large number of issues that we currently aren't finding - I'd bet a
review of a large patchset that we'd fine a number of old bugs.

> From 869185199e7a7b711afb5b51834cbb0691ee8223 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 31 Jul 2025 15:18:02 +0200
> Subject: [PATCH v1 3/6] Add missing Datum conversions
>
> Add various missing conversions from and to Datum. The previous code
> mostly relied on implicit conversions or its own explicit casts
> instead of using the correct DatumGet*() or *GetDatum() functions.

> diff --git a/contrib/btree_gist/btree_utils_num.c b/contrib/btree_gist/btree_utils_num.c
> index 346ee837d75..446fa930b92 100644
> --- a/contrib/btree_gist/btree_utils_num.c
> +++ b/contrib/btree_gist/btree_utils_num.c
> @@ -119,38 +119,38 @@ gbt_num_fetch(GISTENTRY *entry, const gbtree_ninfo *tinfo)
> switch (tinfo->t)
> {
> case gbt_t_bool:
> - datum = BoolGetDatum(*(bool *) entry->key);
> + datum = BoolGetDatum(*(bool *) DatumGetPointer(entry->key));
> break;
> case gbt_t_int2:
> - datum = Int16GetDatum(*(int16 *) entry->key);
> + datum = Int16GetDatum(*(int16 *) DatumGetPointer(entry->key));
> break;
> case gbt_t_int4:
> - datum = Int32GetDatum(*(int32 *) entry->key);
> + datum = Int32GetDatum(*(int32 *) DatumGetPointer(entry->key));
> break;
> case gbt_t_int8:
> - datum = Int64GetDatum(*(int64 *) entry->key);
> + datum = Int64GetDatum(*(int64 *) DatumGetPointer(entry->key));
> break;
> case gbt_t_oid:
> case gbt_t_enum:
> - datum = ObjectIdGetDatum(*(Oid *) entry->key);
> + datum = ObjectIdGetDatum(*(Oid *) DatumGetPointer(entry->key));
> break;
> case gbt_t_float4:
> - datum = Float4GetDatum(*(float4 *) entry->key);
> + datum = Float4GetDatum(*(float4 *) DatumGetPointer(entry->key));
> break;
> case gbt_t_float8:
> - datum = Float8GetDatum(*(float8 *) entry->key);
> + datum = Float8GetDatum(*(float8 *) DatumGetPointer(entry->key));
> break;
> case gbt_t_date:
> - datum = DateADTGetDatum(*(DateADT *) entry->key);
> + datum = DateADTGetDatum(*(DateADT *) DatumGetPointer(entry->key));
> break;
> case gbt_t_time:
> - datum = TimeADTGetDatum(*(TimeADT *) entry->key);
> + datum = TimeADTGetDatum(*(TimeADT *) DatumGetPointer(entry->key));
> break;
> case gbt_t_ts:
> - datum = TimestampGetDatum(*(Timestamp *) entry->key);
> + datum = TimestampGetDatum(*(Timestamp *) DatumGetPointer(entry->key));
> break;
> case gbt_t_cash:
> - datum = CashGetDatum(*(Cash *) entry->key);
> + datum = CashGetDatum(*(Cash *) DatumGetPointer(entry->key));
> break;
> default:
> datum = entry->key;

Would probably be more readable if you put DatumGetPointer(entry->key) into a
local variable...

> From dfd9e4b16e979ac52be682d8c093f883da8d7ba2 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 31 Jul 2025 15:18:02 +0200
> Subject: [PATCH v1 5/6] Fix various hash function uses
>
> These instances were using Datum-returning functions where a
> lower-level function returning uint32 would be more appropriate.

Indeed.

> From 81e56922fe429fae6ebc33b2f518ef88b00ffc42 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter(at)eisentraut(dot)org>
> Date: Thu, 31 Jul 2025 15:18:02 +0200
> Subject: [PATCH v1 6/6] WIP: Datum as struct
>
> This changes Datum to a struct so that you can no longer rely on it
> being implicitly convertable to other C types.

(I only skimmed this very very briefly)

> diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
> index f98805fb5f7..e56e0db97c9 100644
> --- a/contrib/dblink/dblink.c
> +++ b/contrib/dblink/dblink.c
> @@ -653,7 +653,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
> {
> dblink_res_error(conn, conname, res, fail,
> "while fetching from cursor \"%s\"", curname);
> - return (Datum) 0;
> + PG_RETURN_DUMMY();
> }
> else if (PQresultStatus(res) == PGRES_COMMAND_OK)
> {
> @@ -665,7 +665,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
> }
>
> materializeResult(fcinfo, conn, res);
> - return (Datum) 0;
> + PG_RETURN_DUMMY();
> }
>
> /*

Probably worth splitting this type of change out into its own commit, this
is a pretty large change, and that could be done independently.

> diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
> index 9c53877c4a5..d059b31231f 100644
> --- a/contrib/hstore/hstore_io.c
> +++ b/contrib/hstore/hstore_io.c
> @@ -1113,7 +1113,7 @@ hstore_populate_record(PG_FUNCTION_ARGS)
> {
> for (i = 0; i < ncolumns; ++i)
> {
> - values[i] = (Datum) 0;
> + values[i] = DummyDatum;
> nulls[i] = true;
> }
> }

I don't think I like DummyDatum very much, but I don't really have a better
suggestion. It could be worth having a separate NullPointerDatum? Should
also probably be its own patch.

One thing that'd be nice is to mark the value as uninitialized, so that
valgrind etc could find cases where we rely on these dummy datums.

> diff --git a/contrib/ltree/_ltree_gist.c b/contrib/ltree/_ltree_gist.c
> index 286ad24fbe8..2d71cea7e5a 100644
> --- a/contrib/ltree/_ltree_gist.c
> +++ b/contrib/ltree/_ltree_gist.c
> @@ -84,7 +84,7 @@ _ltree_compress(PG_FUNCTION_ARGS)
> entry->rel, entry->page,
> entry->offset, false);
> }
> - else if (!LTG_ISALLTRUE(entry->key))
> + else if (!LTG_ISALLTRUE(entry->key.value))

This should be DatumGet*(), no?

> diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
> index 996ce174454..5d57563ecb7 100644
> --- a/contrib/sepgsql/label.c
> +++ b/contrib/sepgsql/label.c
> @@ -330,7 +330,7 @@ sepgsql_fmgr_hook(FmgrHookEventType event,
> stack = palloc(sizeof(*stack));
> stack->old_label = NULL;
> stack->new_label = sepgsql_avc_trusted_proc(flinfo->fn_oid);
> - stack->next_private = 0;
> + stack->next_private.value = 0;
>
> MemoryContextSwitchTo(oldcxt);

Probably should use DummyDatum.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-08-08 20:52:09 Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events
Previous Message Dagfinn Ilmari Mannsåker 2025-08-08 20:07:42 Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events