Re: undersized unions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: undersized unions
Date: 2023-02-06 18:28:30
Message-ID: 20230206182830.2xt276xeolttpd6z@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

On 2023-02-06 11:42:57 -0500, Robert Haas wrote:
> On Sun, Feb 5, 2023 at 6:28 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On the other hand, it also just seems risky from a code writing perspective. It's not immediate obvious that it'd be unsafe to create an on-stack Numeric by assigning *ptr. But it is.
>
> Well, I think that is pretty obvious: we have lots of things that are
> essentially variable-length types, and you can't put any of them on
> the stack.

There's a difference between a stack copy not containing all the data, and a
stack copy triggering undefined behaviour. But I agree, it's not something
that'll commonly endanger us.

> But I do also think that the Numeric situation is messier than some
> others we have got, and that's partly my fault, and it would be nice
> to make it better.
>
> I do not really know exactly how to do that, though. Our usual pattern
> is to just have a struct and end with a variable-length array, or
> alternatively add a comment says "other stuff follows!" at the end of
> the struct definition, without doing anything that C knows about at
> all. But here it's more complicated: there's a uint16 value for sure,
> and then maybe an int16 value, and then some number of NumericDigit
> values. That "maybe an int16 value" part is not something that C has a
> built-in way of representing, to my knowledge, which is why we end up
> with this hackish thing.

Perhaps something like

typedef struct NumericBase
{
uint16 n_header;
} NumericBase;

typedef struct NumericData
{
int32 vl_len_; /* varlena header (do not touch directly!) */
NumericBase data;
} NumericData;

/* subclass of NumericBase, needs to start in a compatible way */
typedef struct NumericLong
{
uint16 n_sign_dscale; /* Sign + display scale */
int16 n_weight; /* Weight of 1st digit */
} NumericLong;

/* subclass of NumericBase, needs to start in a compatible way */
struct NumericShort
{
uint16 n_header; /* Sign + display scale + weight */
NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
};

Macros that e.g. access n_long would need to cast, before they're able to
access n_long. So we'd end up with something like

#define NUMERIC_SHORT(n) ((NumericShort *)&((n)->data))
#define NUMERIC_LONG(n) ((NumericLong *)&((n)->data))

#define NUMERIC_WEIGHT(n) (NUMERIC_HEADER_IS_SHORT((n)) ? \
((NUMERIC_SHORT(n)->n_header & NUMERIC_SHORT_WEIGHT_SIGN_MASK ? \
~NUMERIC_SHORT_WEIGHT_MASK : 0) \
| (NUMERIC_SHORT(n)->n_header & NUMERIC_SHORT_WEIGHT_MASK)) \
: (NUMERIC_LONG(n)->n_weight))

Although I'd actually be tempted to rip out all the casts but NUMERIC_LONG()
in this case, because it's all all just accessing the base n_header anyway.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-06 18:36:32 Re: undersized unions
Previous Message Jonathan S. Katz 2023-02-06 18:19:50 2023-02-09 release announcement draft