Re: On using doubles as primary keys

From: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Kynn Jones <kynnjo(at)gmail(dot)com>
Cc: pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: On using doubles as primary keys
Date: 2015-04-17 20:00:10
Message-ID: CA+renyUXZzY5m=LW9buofO7zA9PP+G-pkoDT=323Z-ih45fSWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Apr 17, 2015 8:35 AM, "Kynn Jones" <kynnjo(at)gmail(dot)com> wrote:
> (The only reason for wanting to transfer this data to a Pg table
> is the hope that it will be easier to work with it by using SQL

800 million 8-byte numbers doesn't seem totally unreasonable for
python/R/Matlab, if you have a lot of memory. Are you sure you want it in
Postgres? Load the file once then filter it as you like. If you don't have
the memory I can see how using Postgres to get fewer rows at a time might
help. Fewer columns at a time would help even more if that's possible.

> In its simplest form, this would mean using
> doubles as primary keys, but this seems to me a bit weird.

I'd avoid that and just include an integer PK with your data. Datagrams in
the languages above support that, or just slice off the PK column before
doing your matrix math.

Also instead of 401 columns per row maybe store all 400 doubles in an array
column? Not sure if that's useful for you but maybe it's worth considering.

Also if you put the metadata in the same table as the doubles, can you
leave off the PKs altogether? Why join if you don't have to? It sounds like
the tables are 1-to-1? Even if some metadata is not, maybe you can finesse
it with hstore/arrays.

Good luck!

Paul

In response to

Browse pgsql-general by date

  From Date Subject
Next Message William Dunn 2015-04-17 20:49:59 Re: PL\pgSQL 'ERROR: invalid input syntax for type oid:' [PostgreSQL 9.3.6 and 9.4]
Previous Message John McKown 2015-04-17 19:57:35 Re: On using doubles as primary keys