Re: Optimal time series sampling.

From: Ted Byers <r(dot)ted(dot)byers(at)rogers(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Optimal time series sampling.
Date: 2007-11-09 21:18:17
Message-ID: 7062.16983.qm@web88315.mail.re4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


--- Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:

> On Nov 9, 2007 11:47 AM, Ted Byers
> <r(dot)ted(dot)byers(at)rogers(dot)com> wrote:
> >
> > --- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > > Ted Byers <r(dot)ted(dot)byers(at)rogers(dot)com> writes:
> [snip]
> Which is better depends largely on how your database
> is built. MySQL
> still uses loops for all subselects, so with large
> numbers of tuples
> in the subselect method, it will be slow. But they
> might fix this in
> a later release.
>
> Fairly recent versions of PostgreSQL could make some
> bad choices when
> doing joins for certain datasets that would be much
> faster with a
> correlated subquery (specifically the old left join
> where
> righttable.field is null trick made some pgsql
> versions choose an
> inefficient join method)
>
> So, the "right" way is a question of which db, and
> even which version
> of that DB you're on.
>
My life just got soooooooo much more complicated. Oh
well, I guess that will be useful when providing
advice to management when they start seeing
performance issues. Thanks ;-)

Have you looked at version 5.0.45? I am always seeing
the claim that the left join trick is so much more
faster than the correlated subquery, especially if a
function like MAX() is used, but the numbers I am
seeing with real stock price data has it running, on
average, about three times faster than the left join.
So I assumed I was doing something wrong in a manner
than would get me the right answer the slowest way
possible.

> > At this time, the database in use is
> > irrelevant (I want to stick as close to the ANSI
> > standard as practicable so the rewriting required
> will
> > be minimal should we decide to change the database
> > later, for whatever reason).
>
> If you want to stick with ANSI, MySQL tends to be
> more divergent from
> the spec than pgsql and other rdbms.
>
The books I am using, which describe the SQL language,
don't seem to mention or illustrate much difference
among any of the rdbms (including my references that
talk about Oracle and MS SQL Server). The SQL I try
from those books seem to work reasonably well in all
of them (I can't check against Oracle, though, since I
don't have that), and I try most of my SQL against
MySQL, Postgres and MS SQL Server (the biggest
divergences seem to be in how bulk loading of data
happens). Maybe I haven't explored enough of the SQL
language, with large enough datasets, to see the
differences you mention; or perhaps things are
improving with all of them.

> Most people would consider the correlate subquery
> the better method.
> But it's also likely to be the slowest on MySQL.
>
Right now, with this particular query the correlated
subquery is the one that gets me the right answers
about 3 times faster than any other method I have
tried. But it still takes several minutes to get the
results for only a few dozen stocks. And yet I can
get several megabytes of data from the following query
in about a quarter of the time.

SELECT price_date,close_price FROM stockprices A
WHERE A.stock_id = 1 AND price_date IN
(SELECT MAX(B.price_date) FROM stockprices B
WHERE A.stock_id = B.stock_id AND A.stock_id = 1
GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date));

I had expected this to take many times longer than the
"simple" select that gets only the last price for a
given stock, but I was surprised to see it so much
faster than the query that gets just the last prices
for only a couple dozen stocks.

Thanks alot. I learned alot from your reply.

Ted

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2007-11-09 21:20:17 Re: [GENERAL] Crosstab Problems
Previous Message Tomas Vondra 2007-11-09 20:59:55 Re: INSERT performance deteriorates quickly during a large import