Skip site navigation (1) Skip section navigation (2)

Re: Why so slow?

From: Alessandro Gagliardi <alessandro(at)path(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Why so slow?
Date: 2012-02-17 22:20:29
Message-ID: CAAB3BBKODtXSunGZHj0fH3pZ=NLPEYrOdY7AGmBMij169x_Wtg@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Your guess about the seen_its table growing is accurate and applies to the
blocks table as well. Partitioning on date is probably a good idea and
something that I've been meaning to investigate. I'm not surprised that the
JOIN makes it slower, I'm surprised by the magnitude of how much slower it
is.

This is my analytics database (not dev) so no extrapolation is necessary
except in that I know the tables will grow in size. The database is hosted
on AWS and maintained by Heroku.

On Fri, Feb 17, 2012 at 11:21 AM, Steve Crawford <
scrawford(at)pinpointresearch(dot)com> wrote:

> On 02/17/2012 10:34 AM, Alessandro Gagliardi wrote:
>
>> Comparing
>> SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) WHERE
>> seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND
>> now()::date::timestamp
>> to
>> SELECT DISTINCT(user_id) FROM seen_its WHERE created BETWEEN (now()::date
>> - interval '8 days')::timestamp AND now()::date::timestamp
>> the difference is 100x.
>> ...
>>
> Though I could figure it out, it would be helpful to actually specify
> which query is faster and to post the explain of *both* queries.
>
> But in general, it is not terribly unusual to find that rewriting a query
> can lead the planner to generate a superior plan. Trying and testing
> different ways of writing a query is a standard tuning technique.
>
> There are also version-specific issues with some versions of PostgreSQL
> preferring ...where foo in (select... and others preferring ...where exists
> (select...
>
> If you are planning to ramp up to high volumes it is also *very* important
> to test and tune using the size of database you plan to have on the
> hardware you will use in production. You cannot extrapolate from a dev
> database on an i486 (?!?) machine to a production server with more
> spindles, different RAID setup, different CPU, more cores, vastly more
> memory, etc.
>
> In the case of your queries, the second one eliminates a join and gives
> the planner an easy way to optimize using the available indexes so I'm not
> surprised it's faster.
>
> Note: I am guessing that your seen_its table just grows and grows but is
> rarely, if ever, modified. If it is basically a log-type table it will be a
> prime candidate for partitioning on date and queries like this will only
> need to access a couple relatively small child tables instead of one
> massive one.
>
> Cheers,
> Steve
>
>

In response to

pgsql-performance by date

Next:From: Ants AasmaDate: 2012-02-18 05:29:15
Subject: Re: Why so slow?
Previous:From: Steve CrawfordDate: 2012-02-17 19:21:47
Subject: Re: Why so slow?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group