Re: How are null's stored? -- Some numbers

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: How are null's stored? -- Some numbers
Date: 2003-05-13 21:55:41
Message-ID: 20030513215541.GC40542@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I did two experiments. First, as someone mentioned, changing between
char and varchar made absolutely no difference size-wise. In some other
RDBMSes, performance wise char might still win out because the database
wouldn't have to do the math to figure out where in the tuple the fields
are. I know it's splitting hairs, but on what will be a 40M row table...

Second, I modified the table (see below; all fields were originally
nullable):

Before:
usps=# vacuum full analyze verbose zip4_detail;
INFO: --Relation public.zip4_detail--
INFO: Pages 12728: Changed 0, reaped 1, Empty 0, New 0; Tup 467140: Vac
0, Keep/VTL 0/0, UnUsed 19, MinLen 154, MaxLen 302; Re-using:
Free/Avail. Space 1009820/264028; EndEmpty/Avail. Pages 0/1521.
CPU 0.65s/0.86u sec elapsed 1.51 sec.
INFO: Rel zip4_detail: Pages: 12728 --> 12728; Tuple(s) moved: 0.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Analyzing public.zip4_detail

After:
INFO: --Relation public.zip4_detail--
INFO: Pages 13102: Changed 0, reaped 6961, Empty 0, New 0; Tup 467140:
Vac 0, Keep/VTL 0/0, UnUsed 31795, MinLen 166, MaxLen 306; Re-using:
Free/Avail. Space 1136364/190188; EndEmpty/Avail. Pages 0/1056.
CPU 0.41s/0.79u sec elapsed 1.20 sec.
INFO: Rel zip4_detail: Pages: 13102 --> 13102; Tuple(s) moved: 0.
CPU 0.59s/10.02u sec elapsed 18.17 sec.
INFO: Analyzing public.zip4_detail

As you can see, space useage actually went up, by 2.9% (pages). In other
words, it appears to be more efficient to store a null than to store an
empty string in a varchar.

usps=# select count(*) from zip4_detail where street_pre_drctn_abbrev=''
and street_suffix_abbrev='' and street_post_drctn_abbrev='';
-------
9599

usps=# select count(*) from zip4_detail where street_pre_drctn_abbrev=''
or street_suffix_abbrev='';
--------
128434

(all rows have at least one of the 3 fields empty)

Hope someone finds this info useful... :)

Table "public.zip4_detail"
Column | Type | Modifiers
---------------------------+-----------------------+-----------
zip_code | character varying(5) |
update_key_no | character varying(10) |
action_code | character varying(1) |
record_type_code | character varying(1) |
carrier_route_id | character varying(4) |
street_pre_drctn_abbrev | character varying(2) | not null
street_name | character varying(28) |
street_suffix_abbrev | character varying(4) | not null
street_post_drctn_abbrev | character varying(2) | not null
addr_primary_low_no | character varying(10) |
addr_primary_high_no | character varying(10) |
addr_prmry_odd_even_code | character varying(1) |
building_or_firm_name | character varying(40) |
addr_secondary_abbrev | character varying(4) |
addr_secondary_low_no | character varying(8) |
addr_secondary_high_no | character varying(8) |
addr_secny_odd_even_code | character varying(1) |
zip_add_on_low_no | character varying(4) |
zip_add_on_high_no | character varying(4) |
base_alt_code | character varying(1) |
lacs_status_ind | character varying(1) |
govt_bldg_ind | character varying(1) |
finance_no | character varying(6) |
state_abbrev | character varying(2) |
county_no | character varying(3) |
congressional_dist_no | character varying(2) |
muncipality_ctyst_key | character varying(6) |
urbanization_ctyst_key | character varying(6) |
prefd_last_line_ctyst_key | character varying(6) |

--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dennis Gearon 2003-05-13 22:06:04 Re: fomatting an interval (resend)
Previous Message scott.marlowe 2003-05-13 21:51:28 Re: Scheduled jobs

Browse pgsql-performance by date

  From Date Subject
Next Message Jamie Lawrence 2003-05-13 22:30:56 Finding filenames for tables
Previous Message Andreas Pflug 2003-05-13 21:52:29 Re: Finding filenames for tables