Re: Optimize date query for large child tables: GiST or GIN?

From: Thom Brown <thombrown(at)gmail(dot)com>
To: David Jarvis <thangalin(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-20 18:58:53
Message-ID: AANLkTim4ALJCLPKkp15QdLt-mDyyH-ThBLQXOw8Zicli@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 20 May 2010 19:36, Thom Brown <thombrown(at)gmail(dot)com> wrote:
> On 20 May 2010 17:36, David Jarvis <thangalin(at)gmail(dot)com> wrote:
>> Hi, Thom.
>>
>> The query is given two items:
>>
>> Range of years
>> Range of days
>>
>> I need to select all data between the range of days (e.g., Dec 22 - Mar 22)
>> over the range of years (e.g., 1950 - 1970), such as shown here:
>>
>> http://i.imgur.com/MUkuZ.png
>>
>> For Jun 1 to Jul 1 it would be no problem because they the same year. But
>> for Dec 22 to Mar 22, it is difficult because Mar 22 is in the next year
>> (relative to Dec 22).
>>
>> How do I do that without strings?
>>
>> Dave
>>
>>
>
> Okay, get your app to convert the month-date to a day of year, so we
> have year_start, year_end, day_of_year_start, day_of_year_end
>
> and your where clause would something like this:
>
> WHERE extract(YEAR from m.taken) BETWEEN year1 and year2
> AND (
>        extract(DOY from m.taken) BETWEEN day_of_year_start AND day_of_year_end
>        OR (
>                extract(DOY from m.taken) >= day_of_year_start OR extract(DOY from
> m.taken) <= day_of_year_end
>        )
> )
>
> ... substituting the placeholders where they appear.
>
> So if we had:
>
> year1=1941
> year2=1952
> day_of_year_start=244 (based on input date of 1st September)
> day_of_year_end=94 (based on 4th April)
>
> We'd have:
>
> WHERE extract(YEAR from m.taken) BETWEEN 1941 and 1952
> AND (
>        extract(DOY from m.taken) BETWEEN 244 AND 94
>        OR (
>                extract(DOY from m.taken) >= 244 OR extract(DOY from m.taken) <= 94
>        )
> )
>
> Then you could add expression indexes for the YEAR and DOY extract parts, like:
>
> CREATE INDEX idx_taken_doy ON climate.measurement (EXTRACT(DOY from taken));
> CREATE INDEX idx_taken_year ON climate.measurement (EXTRACT(YEAR from taken));
>
> Although maybe you don't need those, depending on how the date
> datatype matching works in the planner with the EXTRACT function.
>
> Regards
>
> Thom
>

Actually, you could change that last bit from:

 OR (
               extract(DOY from m.taken) >= day_of_year_start OR
extract(DOY from m.taken) <= day_of_year_end
       )

to

OR extract(DOY from m.taken) NOT BETWEEN day_of_year_end AND day_of_year_start

That would be tidier and simpler :)

Thom

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-05-20 19:02:19 Re: Optimize date query for large child tables: GiST or GIN?
Previous Message Thom Brown 2010-05-20 18:36:36 Re: Optimize date query for large child tables: GiST or GIN?