| From: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> | 
|---|---|
| To: | Robins Tharakan <tharakan(at)gmail(dot)com> | 
| Cc: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Re: Varchar pkey instead of integer | 
| Date: | 2008-05-21 06:34:45 | 
| Message-ID: | 4833C285.5000603@postnewspapers.com.au | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
Robins Tharakan wrote:
> Hi,
> 
> I am currently designing a database and wanted to know something that may
> sound trivial, but I thought its still good to confirm before dumping
> millions of rows in it.
> 
> The design requires a few master tables with very limited rows, for e.g.
> currency_denomination table could at the max have a few records like million
> / billion / crore (used in india) / lacs (india specific) and so on.
> 
> Now what I wanted to ask was whether its any different to have the
> primary-keys in such master tables as text/varchar rather than integer ?
As far as I know it's just slower to compare (ie for fkey checks, index 
lookups, etc) and uses more storage. However, if you're only using the 
other table to limit possible values in a field rather than storing 
other information and you can avoid doing a join / index lookup by 
storing the string directly in the master table then that might well be 
worth it. It's a tradeoff between the storage cost (seq scan speed, 
index size, etc) of using the text values directly vs the savings made 
by avoiding having to constantly hit a lookup table.
I have several places in the database I'm presently working on where I 
store meaningful integers directly in a "main" table and reference a 
single-field table as a foreign key just to limit acceptable values. It 
works very well, though it's only suitable in limited situations.
One of the places I'm doing that is for in-database postcode validation. 
My current app only needs to validate Australian post codes (as per the 
spec) and other post/zip codes are just stored in the address text. I 
store the integer representation of the post code directly in address 
records but use a foreign key to the single-field "aust_post_code" table 
to enforce the use of only valid postcodes. There's an ON DELETE SET 
NULL cascade on the fkey because for this app's purpose a postcode 
that's no longer accepted by the postal service is bad data.
This means that the postcode list can't be updated by a TRUNCATE and 
repopulate. No big deal; I prefer to do a compare between the current 
database contents and the latest postcode data and insert/delete as 
appropriate anyway; especially as the app needs to be able to record and 
flag tentative entries for postcodes that the user *insists* exist but 
the latest (possibly even weeks old) australia post data says do not.
You could reasonably do the same sort of thing with a text postcode if 
your app had to care about non-numeric postal codes.
It's nice being able to work on something that doesn't have to handle 
pedal-post in some awful corner of the earth where they identify postal 
regions by coloured tags. OK, not really, but sometimes addressing seems 
almost that bad.
> i.e. Can I use a character varying(10) and use the text 'million' /
> 'billion' instead of a serial / integer type ?
If you're looking at a small set of possible values an enumeration 
*might* be an option. Be aware that they're painful and slow to change 
later, though, especially when used in foreign keys, views, etc.
I certainly wouldn't use one for your currency denomination table, which 
is likely to see values added to it over time.
> p.s.: I am not as much concerned with the size that it'd take on the data
> tables, as much as the fact that the select / insert performances shouldn't
> suffer. However, if that increase in size (per data record) may make a
> considerable impact on the performance, I would certainly want to take that
> into account during design phase.
I suspect it's just another tradeoff - table size increase (and thus 
scan performance cost) from storing the text vs avoiding the need to 
access the lookup table for most operations.
--
Craig Ringer
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shane Ambler | 2008-05-21 07:33:34 | Re: Varchar pkey instead of integer | 
| Previous Message | Joshua D. Drake | 2008-05-21 06:16:34 | Re: Varchar pkey instead of integer |