type design guidance needed

From: Brook Milligan <brook(at)biology(dot)nmsu(dot)edu>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: brook(at)biology(dot)nmsu(dot)edu
Subject: type design guidance needed
Date: 2000-09-22 23:05:24
Message-ID: 200009222305.RAA03411@biology.nmsu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am working on designing some new datatypes and could use some
guidance.

Along with each data item, I must keep additional information about
the scale of measurement. Further, the relevant scales of measurement
fall into a few major families of related scales, so at least a
different type will be required for each of these major families.
Additionally, I wish to be able to convert data measured according to
one scale into other scales (both within the same family and between
different families), and these interconversions require relatively
large sets of parameters.

It seems that there are several alternative approaches, and I am
seeking some guidance from the wizards here who have some
understanding of the backend internals, performance tradeoffs, and
such issues.

Possible solutions:

1. Store the data and all the scale parameters within the type.

Advantages: All information contained within each type. Can be
implemented with no backend changes. No access to ancillary tables
required, so processing might be fast.

Disadvantages: Duplicate information on the scales recorded in
each field of the types; i.e., waste of space. I/O is either
cumbersome (if all parameters are required) or they type-handling
code has built-in tables for supplying missing parameters, in
which case the available types and families cannot be extended by
users without recompiling the code.

2. Store only the data and a reference to a compiled-in data table
holding the scale parameters.

Advantages: No duplicate information stored in the fields.
Access to scale data compiled into backend, so processing might be
fast.

Disadvantages: Tables of scale data fixed at compile time, so
users cannot add additional scales or families of scales.
Requires backend changes to implement, but these changes are
relatively minor since all the scale parameters are compiled into
the code handling the type.

3. Store only the data and a reference to a new system table (or
tables) holding the scale parameters.

Advantages: No duplicate information stored in the fields.
Access to scale data _not_ compiled into backend, so users could
add scales or families of scales by modifying the system tables.

Disadvantages: Requires access to system tables to perform
conversions, so processing might be slow. Requires more complex
backend changes to implement, including the ability to retrieve
information from system tables.

Clearly, option 3 is optimal (more flexible, no data duplication)
unless the access to system tables by the backend presents too much
overhead. (Other suggestions are welcome, especially if I have
misjudged the relative merits of these ideas or missed one
altogether.) The advice I need is the following:

- How much of an overhead is introduced by requiring the backend to
query system tables during tuple processing? Is this unacceptable
from the outset or is it reasonable to consider this option further?
Note that the size of these new tables will not be large (probably
less than 100 tuples) if that matters.

- How does one access system tables from the backend code? I seem to
recall that issuing straight queries via SPI is not necessarily the
right way to go about this, but I'm not sure where to look for
alternatives.

Thanks for your help.

Cheers,
Brook

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Evgeni E. Selkov 2000-09-23 04:41:41 Re: type design guidance needed
Previous Message Michael Meskes 2000-09-22 20:21:38 Re: PQsetdbLogin