Re: Reducing the overhead of NUMERIC data

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the overhead of NUMERIC data
Date: 2005-11-01 22:59:23
Message-ID: 1130885963.8300.1713.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Tue, 2005-11-01 at 23:16 +0100, Martijn van Oosterhout wrote:

lots of useful things, thank you.

> > So, assuming I have this all correct, means we could reduce the on disk
> > storage for NUMERIC datatypes to the following struct. This gives an
> > overhead of just 2.5 bytes, plus the loss of the optimization of
> > trailing zeroes, which I assess as having almost no value anyway in
> > 99.9999% of data values (literally...).
>
> Actually, I have a table with a column declared as numeric(12,4)
> because there has to be 4 decimal places. As it turns out, the decimal
> places are mostly zero so the optimisation works for me.

Of course it fits some data. The point is whether it is useful for most
people's data.

My contention is that *most* (but definitely nowhere near all) NUMERIC
data is either financial or measured data. That usually means it has
digits that follow Benfold's Law - which for this discussion is a
variant on a uniform random distribution.

Optimizing for trailing zeroes just isn't worth the very minimal
benefits, in most cases. It doesn't really matter that it saves on
storage and processing time in those cases - Amdahl's Law says we can
ignore that saving because the optimized case is not prevalent enough
for us to care.

Anybody like to work out a piece of SQL to perform data profiling and
derive the distribution of values with trailing zeroes? I'd be happy to
be proved wrong with an analysis of real data tables.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-01 23:05:21 Re: Reducing the overhead of NUMERIC data
Previous Message Tom Lane 2005-11-01 22:55:38 Re: Reducing the overhead of NUMERIC data

Browse pgsql-patches by date

  From Date Subject
Next Message Marko Kreen 2005-11-01 23:04:04 pgcrypto doc polish
Previous Message Tom Lane 2005-11-01 22:55:38 Re: Reducing the overhead of NUMERIC data