Skip site navigation (1) Skip section navigation (2)

Re: 4G row table?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: gry(at)ll(dot)mit(dot)edu, pgsql-performance(at)postgresql(dot)org
Subject: Re: 4G row table?
Date: 2002-12-19 19:15:20
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance

> [linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2]

What kind of RAID?  How many drives?  Will you be updating the data 
frequently, or mostly just running reports on it?

With 4G rows, you will have *heavy* disk access, so the configuration and 
quality of your disk array is a big concern.   You also might think about 
upping th ememory if you can.

> We're setting up a DB of IC test data, which is very simple and regular, but 
> One project (we get three or four per year) has ~4 giga bits, each specified 
> a few small integer values, e.g.:
>  Name       Type Values
>  ----------------------
>  wafer      int	 1-50
>  die_row    int  2-7
>  die_col    int  2-7
>  testtype   string (~10 different short strings)
>  vdd        int  0-25
>  bit_col    int  0-127
>  bit_row    int  0-511
>  value      bit  0 or 1
> with 4 billion(4e9) rows.  I would guess to make wafer, die_row, etc. be of
> type "char", probably testtype a char too with a separate testtype lookup 
> Even so, this will be a huge table.  

1. Use INT2 and not INT for the INT values above.  If you can hire a 
PostgreSQL hacker, have them design a new data type for you, an unsigned INT1 
which will cut your storage space even further.

2. Do not use CHAR for wafer & die-row.  CHAR requries min 3bytes storage; 
INT2 is only 2 bytes.

3. If you can use a lookup table for testtype, make it another INT2 and create 
a numeric key for the lookup table.

> Questions: How much overhead will there be in the table in addition to the
> 9 bytes of data I see?  

There's more than 9 bytes in the above.  Count again.

> How big will the primary index on the first seven columns
> be?  Will this schema work at all? 

As large as the 7 columns themselves, plus a little more.   I suggest creating 
a surrogate key as an int8 sequence to refer to most rows.  

> Of course, we could pack 128 bits into an 8 byte "text" field (or should we 
use bit(128)?),
> but lose some ease of use, especially for naive (but important) users.

This is also unlikely to be more efficient due to the translation<->conversion 
process requried to access the data when you query.

> Comments, suggestions?

Unless you have a *really* good RAID array, expect slow performance on this 
hardware platform.

-Josh Berkus
 Aglio Database Solutions
 San Francisco

In response to


pgsql-performance by date

Next:From: Charles H. WoloszynskiDate: 2002-12-19 19:27:25
Subject: Re: 4G row table?
Previous:From: george youngDate: 2002-12-19 19:10:58
Subject: 4G row table?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group