Re: Optimize date query for large child tables: GiST or GIN?

From: Thom Brown <thombrown(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Jarvis <thangalin(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-20 19:21:30
Message-ID: AANLkTim7O3Aalp31WKwBuTufYXeUMQWkwy51nvRcI0QY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 20 May 2010 20:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thombrown(at)gmail(dot)com> writes:
>> On 20 May 2010 17:36, David Jarvis <thangalin(at)gmail(dot)com> wrote:
>> Okay, get your app to convert the month-date to a day of year, so we
>> have year_start, year_end, day_of_year_start, day_of_year_end
>
>> and your where clause would something like this:
>
>> WHERE extract(YEAR from m.taken) BETWEEN year1 and year2
>> AND (
>>       extract(DOY from m.taken) BETWEEN day_of_year_start AND day_of_year_end
>>       OR (
>>               extract(DOY from m.taken) >= day_of_year_start OR extract(DOY from
>> m.taken) <= day_of_year_end
>>       )
>> )
>
> extract(DOY) seems a bit problematic here, because its day numbering is
> going to be different between leap years and non-leap years, and David's
> problem statement doesn't allow for off-by-one errors.  You could
> certainly invent your own function that worked similarly but always
> translated a given month/day to the same number.
>
> The other thing that's messy here is the wraparound requirement.
> Rather than trying an OR like the above (which I think doesn't quite
> work anyway --- won't it select everything?)

No. It only would if using BETWEEN SYMMETRIC.

Like if m.taken is '2003-02-03', using a start day of year as 11th Nov
and end as 17th Feb, it would match the 2nd part of the outer OR
expression. If you changed the end day of year to 2nd Feb, it would
yield no result as nothing is between 11th Nov and 17th Feb as it's a
negative difference, and 2nd Feb is lower than the taken date so fails
to match the first half of the inner most OR expression.

> , it would be better if
> you can have the app distinguish wraparound from non-wraparound cases
> and issue different queries in the two cases.  In the non-wrap case
> (start_day < end_day) it's pretty easy, just
>        my_doy(m.taken) BETWEEN start_val AND end_val
> The easy way to handle the wrap case is
>        my_doy(m.taken) <= start_val OR my_doy(m.taken) >= end_val
> although I can't help feeling there should be a smarter way to do
> this where you can use an AND range check on some modified expression
> derived from the date.
>
>                        regards, tom lane
>

Yes, I guess I agree that the app can run different queries depending
on which date is higher. I hadn't factored leap years into the
equation. Can't think of what could be done for those cases off the
top of my head. What is really needed is a way to match against day
and month parts instead of day, month and year.... without resorting
to casting to text of course.

Thom

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-05-20 19:21:49 Re: Query causing explosion of temp space with join involving partitioning
Previous Message Tom Lane 2010-05-20 19:02:19 Re: Optimize date query for large child tables: GiST or GIN?