Re: Compare rows

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Compare rows
Date: 2003-10-09 05:36:59
Message-ID: 200310082236.59305.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chris,

> Some time in the late '80s, probably '88 or '89, there was a paper
> presented in Communications of the ACM that proposed using this sort
> of "hypernormalized" schema as a way of having _really_ narrow schemas
> that would be exceedingly expressive. They illustrated an example of
<snip>
> The entertaining claim was that they felt they could model the
> complexities of the operations of any sort of company using not more
> than 50 tables. It seemed somewhat interesting, at the time; it truly
> resonated as Really Interesting when I saw SAP R/3, with its bloat of
> 1500-odd tables.

One can always take things too far. Trying to make everying 100% dynamic so
that you can cram your whole database into 4 tables is going too far; so is
the kind of bloat that produces systems like SAP, which is more based on
legacy than design (I analyzed a large commercial billing system once and was
startled to discover that 1/4 of its 400 tables and almost half of the 40,000
collective columns were not used and present only for backward
compatibility).

The usefulness of the "vertical values child table" which I suggest is largely
dependant on the number of values not represented. In Greg's case, fully
75% of the fields in his huge table are NULL; this is incredibly inefficient,
the more so when you consider his task of calling each field by name in each
query.

The "vertical values child table" is also ideal for User Defined Fields or any
other form of user-configurable add-on data which will be NULL more often
than not.

This is an old SQL concept, though; I'm sure it has an official name
somewhere.

> The need to do a lot of joins would likely hurt performance somewhat,
> as well as the way that it greatly increases the number of rows.
> Although you could always split it into several tables, one for each
> "value_type", and UNION them into a view...

It increases the number of rows, yes, but *decreases* the storage size of data
by eliminating thousands ... or millions ... of NULL fields. How would
splitting the vertical values into dozens of seperate tables help things?

Personally, I'd rather have a table with 3 columns and 8 million rows than a
table with 642 columns and 100,000 rows. Much easier to deal with.

And we are also assuming that Greg seldom needs to see all of the fields at
once. I'm pretty sure of this; if he did, he'd have run into the "wide row"
bug in 7.3 and would be complaining about it.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andriy Tkachuk 2003-10-09 07:25:33 Re: IMMUTABLE function's flag do not work: 7.3.4, plpgsql
Previous Message Tom Lane 2003-10-09 05:10:54 Re: [HACKERS] Sun performance - Major discovery!