Re: Modeling a table with arbitrary columns

From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Modeling a table with arbitrary columns
Date: 2009-10-29 21:24:26
Message-ID: hcd167$60p$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andreas Hartmann wrote on 29.10.2009 21:52:
> Hi everyone,
>
> I want to model the following scenario for an online marketing application:
>
> Users can create mailings. The list of recipients can be uploaded as
> spreadsheets with arbitrary columns (each row is a recipient). I expect
> the following maximum quantities the DB will contain:
>
> * up to 5000 mailings
> * 0-10'000 recipients per mailing, average maybe 2000
> * approx. 20 columns per spreadsheet
>
[...]
>
> I have the feeling that the second approach is cleaner. But since the
> recipient_value table will contain approx. 20 times more rows than the
> recipient table in approach A, I'd expect a performance degradation.
>
> Is there a limit to the number of rows that should be stored in a table?
> With approach B the maximum number of rows could be about 200'000'000,
> which sounds quite a lot …
>

I don't think the number of rows is that critical (it sure doesn't hit any "limits". The question is how you want to access that information and how quick that has to be. If you need sub-second response time for aggregates over that, you'll probably have to throw quite some hardware at it.

You could also check out the hstore contrib module which lets you store key value pairs in a single column, which might actually be what you are looking for (note that I have never used it, so I cannot tell how fast it acutally is)

http://www.postgresql.org/docs/current/static/hstore.html

So something like

CREATE TABLE recipient (
mailing integer NOT NULL,
row integer NOT NULL,
recipient_values hstore,
PRIMARY KEY (mailing, row),
FOREIGN KEY (mailing) REFERENCES mailing (id)
)

Btw: I would get rid of a column named "row", this more a "recipient_id", but that is just personal taste.

Regards
Thomas

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Doug Cole 2009-10-29 22:24:53 Re: optimizing query with multiple aggregates
Previous Message Andreas Hartmann 2009-10-29 20:52:33 Modeling a table with arbitrary columns