Re: reducing NUMERIC size for 9.1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 18:35:12
Message-ID: AANLkTin1tGEVWwJ3CZD8iN5N6JWXVzPC_aiuKa8grbaz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 29, 2010 at 1:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, you would need an additional layer of struct to represent the
> numeric with a length word in front of it.  I think this is not
> necessarily bad because it would perhaps open the door to working
> directly with short-varlena-header values, which is never going to
> be possible with this:
>
>> typedef struct NumericData
>> {
>>     int32           varlen;
>>     int16           n_header;
>>     union { ...
>
> OTOH alignment considerations may make that idea hopeless anyway.

My understanding of our alignment rules for on-disk storage is still a
bit fuzzy, but as I understand it we don't align varlenas. So
presumably if we get a pointer directly into a disk block, the first
byte might happen to be not aligned, which would make the rest of the
structure aligned; and from playing around with the system, it looks
like if we get a value from anywhere else it's typically using the
4-byte varlena header. So it seems like it might be possible to write
code that aligns the data only if needed and otherwise skips a
palloc-and-copy cycle. I'm not totally sure that would be a win, but
it could be. Actually, I had a thought that it might be even more of
a win if you added a flag to the NumericVar representation indicating
whether the digit array was palloc'd or from the original tuple. Then
you might be able to avoid TWO palloc-and-copy cycles, although at the
price of a fairly significant code restructuring.

Which is a long-winded way of saying - it's probably not hopeless.

>> Why n_data as char[1] instead of NumericDigit, you ask?
>
> Yes, we'd have to export NumericDigit if we wanted to declare these
> structs "properly" in numeric.h.  I wonder if that decision should
> be revisited.  I'd lean to making the whole struct local to numeric.c
> though.  Is there anyplace else that really ought to see it?

Probably not. btree_gist is using it, but that's it, at least as far
as our tree is concerned. Attached please find a patch to make the
numeric representation private and add a convenience function
numeric_is_nan() for the benefit of btree_gist. If this looks sane,
I'll go ahead and commit it, which will simplify review of the main
patch once I rebase it over these changes.

>>> I hadn't actually looked.  I think though that it's a mistake to break
>>> compatibility on both dscale and weight when you only need to break one.
>>> Also, weight is *certainly* uninteresting for NaNs since it's not even
>>> meaningful unless there are digits.  dscale could conceivably be worth
>>> something.
>
>> I don't think I'm breaking compatibility on anything.  Can you clarify
>> what part of the code you're referring to here?  I'm sort of lost.
>
> On-disk is what I'm thinking about.  Right now, a NaN's first word is
> all dscale except the sign bits.  You're proposing to change that
> but I think it's unnecessary to do so.

*Where* am I proposing this? The new versions of NUMERIC_WEIGHT() and
NUMERIC_DSCALE() determine where to look for the bits in question
using NUMERIC_IS_SHORT(), which just tests NUMERIC_FLAGBITS(n) ==
NUMERIC_SHORT. There's nothing in there about the NaN case at all.
Even if there were, it's irrelevant because those bits are never
examined and, as far as I can tell, will always be zero barring a
cosmic ray hit. But even if they WERE examined, I don't see where I'm
changing the interpretation of them; in fact, I think I'm very
explicitly NOT doing that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment Content-Type Size
make_numericdata_private.patch application/octet-stream 4.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-29 19:29:40 Re: Performance Enhancement/Fix for Array Utility Functions
Previous Message Joshua D. Drake 2010-07-29 18:16:50 Re: On Scalability