Re: type design guidance needed

From: "Evgeni E(dot) Selkov" <selkovjr(at)mcs(dot)anl(dot)gov>
To: brook(at)biology(dot)nmsu(dot)edu
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: type design guidance needed
Date: 2000-09-23 04:41:41
Message-ID: 200009230441.XAA09037@juju.mcs.anl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Brook,

I have been contemplating such data type for years. I believe I have
assembled the most important parts, but I did not have time to
complete the whole thing.

The idea is that hte units of measurement can be treated as arithmetic
expressions. One can assign each of the few existing base units a
fixed position in a bit vector, parse the expression, then evaluate it
to obtain three things: scale factor, numerator and quotient, the
latter two being bit vectors.

So, if you assign the base units as

'm' => 1,
'kg' => 2,
's' => 4,
'K' => 8,
'mol' => 16,
'A' => 32,
'cd' => 64,

the unit, umol/min/mg, will be represented as

(0.01667, 00010000,00000110).

Such structure is compact enough to be stashed into an atomic type.
In fact, one needs more than just a plain bit vector to represent
exponents:

umol/min/ml => (0.01667, '00010000', '00000103') (because ml is a m^3)

Here I use the whole charater per bit for clarity, but one does not
need more than two or three bits -- you normally don't have kg^4 or
m^7 in your units.

I considered other alternatives, but none seemed as good as an atomic
type. I can bet you will see performance problems and indexing
nightmare with non-atomic solutions well before you hit the space
constraints with the atomic type. You are even likely to see the space
problems with the non-atomic storage: pointers can easily cost more
than compacted units.

There are numerous benefits to the atomic type. The units can be
re-assembled on the output, the operators can be written to work on
non-normalized units and discard the incompatible ones, and the
chances that you screw up the unit integrity are none.

So, if that makes sense, I will be willing to funnel more energy into
this project, and I would aprreciate any co-operation.

In the meanwhile, you might want to check out what I have done so far.

1. A perl parser for the units of measurement that computes units as
algebraic expressions. I have done it in perl for the ease of
prototyping, but it is flex- and bison-generated and can be ported
to c and included into the data type.

Get it from
http://wit.mcs.anl.gov/~selkovjr/Unit.tgz

This is a regular perl extension; do a

perl Makefile.PL; make; make install

type of thing, but first you need to build and install my version of
bison, http://wit.mcs.anl.gov/~selkovjr/camel-1.24.tar.gz

There is a demo script that you can run as follows

perl browse.pl units

2. The postgres extension, seg, to which I was planning to add the
units of measurement. It has its own use already, and it
exemplifies the use of the yacc parser in an extension.

Please see the README in

http://wit.mcs.anl.gov/~selkovjr/pg_extensions/

as well as a brief description in

http://wit.mcs.anl.gov/EMP/seg-type.html

and a running demo in

http://wit.mcs.anl.gov/EMP/indexing.html (search for seg)

Food for thought.

--Gene

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-09-23 05:41:22 Re: type design guidance needed
Previous Message Brook Milligan 2000-09-22 23:05:24 type design guidance needed