Re: Ad-hoc table type?

From: pgsql(at)mohawksoft(dot)com
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Ad-hoc table type?
Date: 2008-09-29 01:27:52
Message-ID: 53102.71.232.149.185.1222651672.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> pgsql(at)mohawksoft(dot)com writes:
>> Being able to insert arbitrary named values, and extracting them
>> similarly, IMHO works "better" and more naturally than some external
>> aggregate system built on a column. I know it is a little "outside the
>> box" thinking, what do you think?
>
> I'm failing to see the point. Allowing columns to spring into existence
> without any forethought seems to me to be all minuses and no pluses
> worth mentioning.
>
> * What if the column name is just a typo?

In an automated system like PHP, Java, etc. that's not too likely.

>
> * What datatype should it have? ("Always varchar" is just lame.)

varchar or text is not "just lame," SQLite used to do that exclusively.
One could argue that XML is nothing more than text.

>
> * Should it have an index? If so, should it be unique?

The answer to that is, well, no, not unless the dba generates one or it is
declared. Just like any other column. All the rules that apply to "create
table" and "alter table add column" just apply naturally as would be
expected.

create adhoc table userdata(username varchar, email varchar, primary
key(email));

>
> * If you keep doing this, you'll soon find yourself reading out
> unbelievably wide tables (lots of columns), which won't be especially
> easy or efficient to process on either the backend or the client side.
> Plus you might run into the max-columns-per-tuple limit.

Well, I fully understand that it is not a general purpose "unlimited"
width sort of thing. In a programing environment, the target environment
for this type of feature, it is unlikely to be a run-away problem.

>
> If you've expended enough thought to be sure that the column is not just
> a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
> command to tell the database the results of your genius.

Like I said, if you've never dealt with a live web site, maintained by a
team of "web dudes," working furiously to keep their job and get paid,
your only hope to keep up with "Oh! I needed to add the 'time to live' of
the session into the session data" is to use an aggregate storage system.

>
> I do see the point that switching from "member of an hstore column" to
> "real database column" is pretty painful, but I don't see that "allow
> columns to spring into existence" solves that in any meaningful way.
> Is there some other way we could address such conversions?

Every other solution creates a second tier of data storage. You either
deal with data elements at the table level, or you create a "roll your
own" aggregate mechanism, or make a HUGE table of "user,name,value" table
and force a join and index scan for every select. (A million users, 5-10
attributes each is an expensive join.)
>
> BTW, I think it is (or should be) possible to create an index on
> hstore->'mycol', so at least one of the reasons why you should *need*
> to switch to a "real" database column seems bogus.

Oh, yea, function indexes work great. I think you did that right?

For what its worth, I don't expect you to jump all over this. It really is
a divergence from classic SQL design. I'm not even sure I like it. In
fact, I don't like it, but the argument that you are being forced to
create a second class data storage mechanism or a relational join for data
that is logically in a single relation does cause one to ponder the
problem.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2008-09-29 02:31:26 Operation needed for datfrozenxid bug?
Previous Message Tom Lane 2008-09-29 01:16:01 Re: [PATCHES] Infrastructure changes for recovery