Re: Fixed length data types issue

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixed length data types issue
Date: 2006-09-10 20:38:39
Message-ID: 450477CF.4020401@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote:
>>> Well, it is unless you are willing to give up support of non-Intel CPUs;
>>> most other popular chips are strict about alignment, and will fail an
>>> attempt to do a nonaligned fetch.
>> Intel CPUs are detectable at compile time, right? Do we use less
>> padding in the layout for tables on Intel-based servers? If not, could we?
>
> Intel CPUs may not complain about unaligned reads, they're still
> inefficient. Internally it does two aligned reads and rearranges the
> bytes. On other architechtures the OS can emulate that but postgres
> doesn't use that for obvious reasons.

This gets back to the CPU vs. I/O bound issue, right? Might not some
people (with heavily taxed disks but lightly taxed CPU) prefer that
trade-off?

>> For the example schema which started this thread, a contrib extension
>> for ascii fields could be written, with types like ascii1, ascii2,
>> ascii3, and ascii4, each with implicit upcasts to text. A contrib for
>> int1 and uint1 could be written to store single byte integers in a
>> single byte, performing math on them correctly, etc.
>
> The problem is that for each of those ascii types, to actually use them
> they would have to be converted, which would amount to allocating some
> memory, copying and adding a length header. At some point you have to
> wonder whether you're actually saving anything.
>
> Have a nice day,

I'm not sure what you mean by "actually use them". The types could have
their own comparator operators. So you could use them for sorting and
indexing, and use them in WHERE clauses with these comparisons without
any conversion to/from text. I mentioned implicit upcasts to text
merely to handle other cases, such as using them in a LIKE or ILIKE, or
concatenation, etc., where the work of providing this functionality for
each contrib datatype would not really be justified.

I'm not personally as interested in the aforementioned ascii types as I
am in the int1 and int3 types, but the argument in favor of each is
about the same. If a person has a large table made of small data, it
seems really nuts to have 150% - 400% bloat on that table, when such a
small amount of work is needed to write the contrib datatypes necessary
to store the data compactly. The argument made upthread that a
quadratic number of conversion operators is necessitated doesn't seem
right to me, given that each type could upcast to the canonical built in
type. (int1 => smallint, int3 => integer, ascii1 => text, ascii2 =>
text, ascii3 => text, etc.) Operations on data of differing type can be
done in the canonical type, but the common case for many users would be
operations between data of the same type, for which no conversion is
required.

Am I missing something that would prevent this approach from working? I
am seriously considering writing these contrib datatypes for use either
on pgfoundary or the contrib/ subdirectory for the 8.3 release, but am
looking for advice if I am really off-base.

Thanks,

mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-09-10 20:47:01 Re: ISBN/ISSN/ISMN/EAN13 module
Previous Message Tom Lane 2006-09-10 20:24:33 Re: contrib uninstall scripts need some love