Re: Random Page Cost and Planner

From: David Jarvis <thangalin(at)gmail(dot)com>
To: tv(at)fuzzy(dot)cz
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random Page Cost and Planner
Date: 2010-05-26 18:55:37
Message-ID: AANLkTinPELm_Ckm1zvcm7-XtWZAm3vMClMQGJ5qMAxdy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

sc.taken_end <= '1996-12-31'::date AND
> m.taken BETWEEN sc.taken_start AND sc.taken_end AND
>
> category of data at a certain time. But I'm afraid this makes the planning
> much more difficult, as the select from measurements depend on the data
> returned by other parts of the query (rows from category).
>

Right. Users can select 1900 - 2009. Station data hardly ever spans that
range.

The *station_category* is used to create a unique key into the measurement
data for every station: station_id, category_id, and taken_start. The
measurement data should be contiguous until taken_end.

I thought that that combination would be a pointer to the exact spot in the
measurement table where the data starts, which should be ridiculously fast
to find.

See this http://explain.depesz.com/s/H1 and this
> http://explain.depesz.com/s/GGx
>

I was getting some red lines when I looked at a different plan. It's a great
site.

How many rows does the query return without the group by clause? About
> 140000 in both cases, right?
>

SELECT
*
FROM
climate.measurement m
WHERE
m.station_id = 5148 AND
m.taken BETWEEN '1900-08-01'::date AND '2009-12-31'::date AND
m.category_id = 1

5397 rows (10 seconds cold; 0.5 seconds hot); estimated too high by 2275
rows?

http://explain.depesz.com/s/uq

OK, I haven't noticed the table is already partitioned by category_id and
> I didn't mean to partition by (taken, category_id) - that would produce a
> lot of partitions. Yes, that might cause problems related to number of
> files, but that's rather a filesystem related issue.
>

Constrained as:

CONSTRAINT measurement_013_category_id_ck CHECK (category_id = 7)

> I'd expect rather issues related to RULEs or triggers (not sure which of
> them you use to redirect the data into partitions). But when partitioning
>

I created seven child tables of measurement. Each of these has a constraint
by category_id. This makes it extremely fast to select the correct
partition.

> I'm not sure what you mean by 'year wrapping issue' but I think it might
> work quite well - right not the problem is PostgreSQL decides to scan the
> whole partition (all data for a given category_id).
>

I'll give it another try. :-)

*Use Case #1*
User selects: Mar 22 to Dec 22
User selects: 1900 to 2009

Result: Query should average *9 months* of climate data per year between Mar
22 and Dec 22 of Year.

*Use Case #2*
User selects: Dec 22 to Mar 22
User selects: 1900 to 2009

Result: Query should average *3 months* of climate data per year between Dec
22 of Year and Mar 22 of Year+1.

So if a user selects 1950 to *1960*:

- first case should average between 1950 and *1960*; and
- second case should average between 1950 and *1961*.

Dave

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Jarvis 2010-05-26 20:21:19 Re: Random Page Cost and Planner
Previous Message David Jarvis 2010-05-26 18:26:53 Re: Random Page Cost and Planner