more numeric stuff

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: more numeric stuff
Date: 2010-08-04 19:34:30
Message-ID: AANLkTi=U7_ce7D6e40gRQhsX0NH1pUAjXw8=zheWTHkf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have a couple ideas for further work on the numeric code that I want
to get feedback on.

1. Cramming it down some more. I propose that we introduce a third
format with a one-byte header: 1 bit for sign, 3 bits for dynamic
scale, and 4 bits for weight (the first of which is a sign bit). This
might seem crazy, but it's still enough to represent values with a
weight of between +7 and -8, but it's still enough to represent a
number with up to 32 digits before the decimal point and up to 7
decimal places, which covers a lot of ground. And if you've got a
billion rows on disk with several numeric values in each row, saving a
byte per value starts to be significant. We don't need any special
marker to indicate that the 1-byte format is in use, because we can
deduce it from the length of the varlena (after excluding the header):
even = 2b or 4b header, odd = 1b header. There can't be any
odd-length numerics already on disk, so there shouldn't be any
compatibility break for pg_upgrade to worry about.

2. Don't untoast/don't copy. Right now, given a numeric stored as a
short varlena (the normal case if it's coming from on disk), we
untoast it before doing anything, and then we copy the digits into a
separate palloc'd digit buffer. Copying the data twice is clearly a
waste. It seems that very few of the var-manipulation functions in
numeric.c actually scribble on their input (exceptions I've found so
far: round_var, trunc_var, strip_var). So, when translating a Numeric
into a NumericVar (set_var_from_num), we could potentially skip
allocating the digit buffer if the digit string in the Numeric is
already allocated, and teach the few functions that need to scribble
on their input to force the buffer to be allocated if it hasn't been
yet. I'm not too sure whether this is the trouble; a quick test this
morning suggested that such a patch would not be too difficult to
write, but on the other hand the performance gain was pretty small.
Another, not necessarily mutually exclusive option would be to try to
operate directly on the packed format. That looks like it would
require some fairly major surgery; I'm not sure what we would do with
the many copies of this code:

Numeric num = PG_GETARG_NUMERIC(0);

3. 64-bit arithmetic. Right now, mul_var() and div_var() use int for
arithmetic, but haven't we given up on supporting platforms without
long long? I'm not sure I'm motivated enough to write the patch
myself, but it seems like 64-bit arithmetic would give us a lot more
room to postpone carries.

OK, time to duck. Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-08-04 19:36:38 Re: patch for contrib/isn
Previous Message Thom Brown 2010-08-04 19:32:58 Re: Drop one-argument string_agg? (was Re: string_agg delimiter having no effect with order by)