Re: Optimize date query for large child tables: GiST or GIN?

From: David Jarvis <thangalin(at)gmail(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thombrown(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-21 15:17:57
Message-ID: AANLkTinFXWorUTT9iqagOKbsid_ojdniMC0e41ZK_mB7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, Yeb.

This is starting to go back to the design I used with MySQL:

- YEAR_REF - Has year and station
- MONTH_REF - Has month, category, and yea referencer
- MEASUREMENT - Has month reference, amount, and day

Normalizing by date parts was fast. Partitioning the tables by year won't do
much good -- users will probably choose 1900 to 2009, predominately.

I thought about splitting the data by station by category, but that's ~73000
tables. My understanding is that PostgreSQL uses files per index, which
would be messy at the OS level (Linux 2.6.31). Even by station alone is
12139 tables, which might be tolerable for now, but with an order of
magnitude more stations on the distant horizon, it will not scale.

I also thought about splitting the data by station district by category --
there are 79 districts, yielding 474 child tables, which is ~575000 rows per
child table. Most of the time I'd imagine only one or two districts would be
selected. (Again, hard to know exactly.)

Dave

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Yeb Havinga 2010-05-21 18:21:21 Re: Optimize date query for large child tables: GiST or GIN?
Previous Message Matthew Wakeling 2010-05-21 15:08:38 Re: Optimize date query for large child tables: GiST or GIN?