Re: Is there any plan to add unsigned integer types?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: crocket <crockabiscuit(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Is there any plan to add unsigned integer types?
Date: 2011-09-26 15:01:06
Message-ID: CAHyXU0w63qgXWdh5zh8jOBZKRAtgS9GXuMV3y7Ea19a3mPc+Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 26, 2011 at 9:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Sep 26, 2011 at 10:02 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Mon, Sep 26, 2011 at 5:41 AM, crocket <crockabiscuit(at)gmail(dot)com> wrote:
>>> MySQL already has unsigned INT type, and it has double the range of
>>> signed INT type.
>>> It's not just the bigger range that UINT type brings.
>>> If unsigned INT type exists, I wouldn't have to execute "create domain
>>> UINT" in every database.
>>>
>>> If "INT unsigned" and "SERIAL unsigned" exist, PostgreSQL would bring
>>> more convenience to users.
>>
>> This comes up now and then.  The problem is the benefit gained is not
>> really worth the pain.  In today's 64 bit world, choosing a 64 bit int
>> nails the cases where you need the extra range and you have the
>> ability to use constraints (if necessary, through a domain) to enforce
>> correctness.
>>
>> On the downside, you have to add a lot of casts, overloads, etc.
>> Figuring out the casting rules is non trivial and could lead to
>> surprising behaviors...inferring the type of 'unknown' strings is bad
>> enough as it is.
>>
>> TBH, what I'd greatly prefer to see is to have domains be finished up
>> so that you don't have to carefully consider their use (for example,
>> you can't make arrays out of them).  Then an unsigned int could simply
>> be:
>>
>> create domain uint as bigint check (value >= 0);
>
> Even if we did that, there might still be cases where people would
> want unsigned integers as a means of reducing storage.  4 extra bytes
> may not seem like that much, but if you have billions of rows, it adds
> up - not just in terms of actual storage space, but also in terms of
> disk and memory bandwidth requirements when you want to do anything
> with that data.  I have seen some recent data (which is not entirely
> conclusive) that suggests that memory bandwidth can be a huge problem
> for PostgreSQL performance on large boxes; and I think Greg Smith has
> made similar comments in the past (correct me if I'm wrong, Greg).
>
> I think, though, that if we choose to attack that problem in the first
> instance by adding support for unsigned integers, we're probably going
> to be only nibbling around the edges of the problem.  Reducing
> alignment padding and adding block-level compression would benefit a
> much larger number of workloads.  Those are not easy projects, but
> unfortunately, due to the constraints of our type system, neither is
> this.  :-(

right -- exactly. most 'savings' in this vein are nothing but due to
padding and other factors such that (at least today) there is no
disadvantage to going to 64 bit in range constrained cases. also, I'd
submit history has been unkind to hardware dependent optimization
strategies in userland -- the engine should be dealing with this
problem. better to define your data the proper way and let the
assembly instruction counting gurus in -hackers worry about it :-).

compression is an interesting topic: the guys over at tokudb are
making some wild claims...i'm curious if they are real, and what the
real tradeoffs are.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-09-26 15:03:44 Re: contrib/sepgsql regression tests are a no-go
Previous Message Tom Lane 2011-09-26 14:44:38 Re: Support UTF-8 files with BOM in COPY FROM