Re: query by partial timestamp

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavan Schneider <pg-gts(at)snkmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: query by partial timestamp
Date: 2013-01-09 00:48:32
Message-ID: 6304.1357692512@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Gavan Schneider <pg-gts(at)snkmail(dot)com> writes:
> From my perspective there are at least three ways to attack
> this problem:

> (I have not tested these, so apologies for the stupid syntax errors.)

> 1. SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp);

> 2. SELECT ... WHERE
> '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
> AND col_of_type_timestamp <= '2011-12-31'::TIMESTAMP;

> 3. SELECT ... WHERE
> (col_of_type_timestamp, col_of_type_timestamp) OVERLAPS
> (DATE '2011-01-01', DATE '2012-01-01');

> Is this the full list?

Another possibility is date_trunc, viz

4. SELECT ... WHERE date_trunc(col_of_type_timestamp, 'year') = '2011-01-01'

You could also use BETWEEN, but that's just syntactic sugar for method 2.

Note that in both methods 2 and 3 it's easy to get the edge cases wrong;
in particular I think your version of method 2 gives the wrong answer
for later-than-midnight times on 2011-12-31, while #3 might (not sure)
give the wrong answer for exactly midnight on 2012-01-01. These things
are fixable of course with a bit of care. Personally I'd go with

col >= '2011-01-01' AND col < '2012-01-01'

> So... generalizing the original question: which approach would
> yield the best performance and/or compliance with SQL standards?

> I note Steve Crawford has (strongly) hinted that direct date
> comparison is more likely to use an index (when available) so I
> suspect this is the way to go, but would an index based on
> extract(YEAR...) negate this difference?

Method 3 is not indexable at all and is unlikely to become so --- the
SQL standard's definition of OVERLAPS is squirrely enough that people
haven't bothered to think about optimizing it. Method 2 works well with
a plain btree index on the timestamp column. You can get method 1 to be
indexed if you create a functional index on "extract(year from col)";
but since the index would have pretty much no other use than answering
this exact type of query, that's not a very attractive alternative.
Method 4 is like method 1 --- you'd need a specialized index.

Note that in any case an index is not going to be helpful if the query
would need to fetch more than a few percent of the table. The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.

Another thing to think about is whether you'll have related sorts of
queries that aren't about full years --- maybe sometimes you need a
month's worth of data, for example. The BETWEEN-style query and a btree
index will adapt easily to non-year intervals, while the EXTRACT
approach will not, and date_trunc is rather limited as well.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jerry Sievers 2013-01-09 01:06:23 Re: Discerning when functions had execute revoked from public
Previous Message Gavan Schneider 2013-01-09 00:26:26 Re: query by partial timestamp