Skip site navigation (1) Skip section navigation (2)

Re: More then 1600 columns?

From: Dann Corbit <DCorbit(at)connx(dot)com>
To: "'Clark C(dot) Evans'" <cce(at)clarkevans(dot)com>, Mark Mitchell<mmitchell(at)riccagroup(dot)com>, 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: More then 1600 columns?
Date: 2010-11-12 21:10:41
Message-ID: 87F42982BF2B434F831FCEF4C45FC33E4207CA38@EXCHANGE.corporate.connx.com (view raw or flat)
Thread:
Lists: pgsql-general
> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Clark C. Evans
> Sent: Friday, November 12, 2010 12:25 PM
> To: Mark Mitchell; 'Tom Lane'
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] More then 1600 columns?
> 
> On Fri, 12 Nov 2010, Tom Lane wrote:
> > Generally, wanting more than a few dozen columns is a
> > good sign that you need to rethink your schema design.
> > What are you trying to accomplish exactly?
> 
> Generally speaking, yes.  However, survey instruments are a
> very legitimate design where this is not only correct, but
> quite desirable.  Sure -- in an ideal world you might be
> able to turn your instrument into something that is a highly
> normalized structure. However, it's simply not practical.

Two tables with a foreign key is not practical?
 
> PostgreSQL absolutely should look into supporting this.  While
> the storage may not be ideal, it is an increasingly common case.
> Perhaps the storage engine could permit a set of columns to be
> tagged for ``sparse column storage``?

Reminds me of a quote from the movie Pinocchio:
"Give a boy enough rope and he'll hang himself."
Is a fundamental design change for the purpose of allowing massive denormalization really a good idea?
 
> On Fri, 12 Nov 2010, Mark Michell wrote:
> > Yes I understand that this is "bad design" but what we are doing is
> > storing each form field in a survey in its own column. For very long
> > surveys we end up with thousands of elements.
> 
> It's *not* necessarily bad design.  If you've got a survey
> instrument that has 1000 unique data points, it is a reasonable
> thing to do.  We (Prometheus Research) have these sorts of
> things all the times.
> 
> #1  You want to store and query them based on specific fields.

Use a join

> #2  You want to enforce the data integrity of each column
>     by using types, enums and other techniques.

Non-sequiter
 
> #3  You want to integrate this structure with the other
>     aspects of your database.

Foreign keys
 
> > I know storing in an array is possible but it makes it
> > so much easier to query the data set when each element
> > is in its own field.  comments on why I should not do this
> 
> Using HSTORE or using an EAV table completely undermines
> the ability to ensure column-level typing, constraints,
> and... use SQL.  With arrays / EAV table structure, you
> end up having to re-invent your query language. It is a
> non-solution.
> 
> > the possible alternatives
> 
> The work-around we use is to break our instrument into
> one table per ``section``.  We also save each section as the
> user completes that part of the instrument.  This has the
> advantage of allowing you to model repeating sections.  We
> typically enforce column uniqueness across all sections.
> This requires your application and query tool to be a bit
> smart about making a virtual table with all 1000+ columns.
> Kinda sucks, but it works.
> 
> ...
> 
> I guess the other option is to look at a RDF data store where
> you use SPARQL to query the data.  OpenJena SDB is implemented
> as a 3-table storage in PostgreSQL if you wish to keep the same
> database for both your assessment storage and other tables. I'm
> not lovin' this option, but it's on the table for our company.
> 
> Instead, I'd absolutely love to see PostgreSQL add a variant
> scalar type (for when the type of a column changes over the
> lifetime of the instrument) and a column storage for Nth+
> and additional columns.  For now, the work-around above sort
> of works for us.

Sometimes it is a good idea to denormalize.  For instance, performance might improve if you put bill_to_address and ship_to_address in the customer record instead of joining against an address table.  But if you have thousands of different child types, it is a good idea to ask yourself about the value of denormalization verses the cost.  There is a reason that BCNF was invented in 1974.

Normally, I am all for the idea of making a database more capable.  But if that design change encourages terrible practice I can think of better ways to expend programmer time and effort (namely on changes that encourage good practice and increased productivity).

I would be very curious to hear of a real case where there is an advantage to having many thousands of columns in a table verses using a child table.  I have never seen such a thing in real life.  I am not opposed to the idea that such a thing is possible.  It is just that I have always found the child table is simply better.  If (for access) the single table seems simpler, then a view can be used.

IMO-YMMV



In response to

Responses

pgsql-general by date

Next:From: Clark C. EvansDate: 2010-11-12 21:25:38
Subject: Re: More then 1600 columns?
Previous:From: Steve CrawfordDate: 2010-11-12 21:03:28
Subject: Re: index row requires 10040 bytes, maximum size is 8191

Privacy Policy | About PostgreSQL
Copyright © 1996-2013 The PostgreSQL Global Development Group