Skip site navigation (1) Skip section navigation (2)

Re: Optimize date query for large child tables: GiST or GIN?

From: David Jarvis <thangalin(at)gmail(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-20 19:45:53
Message-ID: AANLkTikl8wlHPJaArMWO9XWgG0vk9tununw3eUAYh1IC@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
When using MySQL, the performance was okay (~5 seconds per query) using:

  date( concat_ws( '-', y.year, m.month, d.day ) ) between
    -- Start date.
    date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND
    -- End date. Calculated by checking to see if the end date wraps
    -- into the next year. If it does, then add 1 to the current year.
    --
    date(
      concat_ws( '-',
        y.year + greatest( -1 *
          sign(
            datediff(
              date(
                concat_ws('-', y.year, $P{Month2}, $P{Day2} )
              ),
              date(
                concat_ws('-', y.year, $P{Month1}, $P{Day1} )
              )
            )
          ), 0
        ), $P{Month2}, $P{Day2}
      )
    )

This calculated the correct start days and end days, including leap years.

With MySQL, I "normalized" the date into three different tables: year
references, month references, and day references. The days contained only
the day (of the month) the measurement was made and the measured value. The
month references contained the month number for the measurement. The year
references had the years and station. Each table had its own index on the
year, month, or day.

When I had proposed that solution to the mailing list, I was introduced to a
more PostgreSQL-way, which was to use indexes on the date field.

In PostgreSQL, I have a single "measurement" table for the data (divided
into 72 child tables), which includes the date and station. I like this
because it feels clean and it is easier to understand. So far, however, it
has not been fast.

I was thinking that I could add three more columns to the measurement table:

year_taken, month_taken, day_taken

Then index those. That should allow me to avoid extracting years, months,
and days from the *m.taken* date column.

What do you think?

Thanks again!
Dave

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2010-05-20 19:52:50
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Previous:From: Tom LaneDate: 2010-05-20 19:21:49
Subject: Re: Query causing explosion of temp space with join involving partitioning

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group