Skip site navigation (1) Skip section navigation (2)

Re: Random Page Cost and Planner

From: David Jarvis <thangalin(at)gmail(dot)com>
To: tv(at)fuzzy(dot)cz
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random Page Cost and Planner
Date: 2010-05-26 18:55:37
Message-ID: AANLkTinPELm_Ckm1zvcm7-XtWZAm3vMClMQGJ5qMAxdy@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

sc.taken_end <= '1996-12-31'::date AND
> m.taken BETWEEN sc.taken_start AND sc.taken_end AND
>
> category of data at a certain time. But I'm afraid this makes the planning
> much more difficult, as the select from measurements depend on the data
> returned by other parts of the query (rows from category).
>

Right. Users can select 1900 - 2009. Station data hardly ever spans that
range.

The *station_category* is used to create a unique key into the measurement
data for every station: station_id, category_id, and taken_start. The
measurement data should be contiguous until taken_end.

I thought that that combination would be a pointer to the exact spot in the
measurement table  where the data starts, which should be ridiculously fast
to find.

See this http://explain.depesz.com/s/H1 and this
> http://explain.depesz.com/s/GGx
>

I was getting some red lines when I looked at a different plan. It's a great
site.

How many rows does the query return without the group by clause? About
> 140000 in both cases, right?
>

SELECT
  *
FROM
  climate.measurement m
WHERE
  m.station_id = 5148 AND
  m.taken BETWEEN '1900-08-01'::date AND '2009-12-31'::date AND
  m.category_id = 1

5397 rows (10 seconds cold; 0.5 seconds hot); estimated too high by 2275
rows?

http://explain.depesz.com/s/uq

 OK, I haven't noticed the table is already partitioned by category_id and
> I didn't mean to partition by (taken, category_id) - that would produce a
> lot of partitions. Yes, that might cause problems related to number of
> files, but that's rather a filesystem related issue.
>

Constrained as:

  CONSTRAINT measurement_013_category_id_ck CHECK (category_id = 7)


> I'd expect rather issues related to RULEs or triggers (not sure which of
> them you use to redirect the data into partitions). But when partitioning
>

I created seven child tables of measurement. Each of these has a constraint
by category_id. This makes it extremely fast to select the correct
partition.


> I'm not sure what you mean by 'year wrapping issue' but I think it might
> work quite well - right not the problem is PostgreSQL decides to scan the
> whole partition (all data for a given category_id).
>

I'll give it another try. :-)

*Use Case #1*
User selects: Mar 22 to Dec 22
User selects: 1900 to 2009

Result: Query should average *9 months* of climate data per year between Mar
22 and Dec 22 of Year.

*Use Case #2*
User selects: Dec 22 to Mar 22
User selects: 1900 to 2009

Result: Query should average *3 months* of climate data per year between Dec
22 of Year and Mar 22 of Year+1.

So if a user selects 1950 to *1960*:

   - first case should average between 1950 and *1960*; and
   - second case should average between 1950 and *1961*.

Dave

In response to

pgsql-performance by date

Next:From: David JarvisDate: 2010-05-26 20:21:19
Subject: Re: Random Page Cost and Planner
Previous:From: David JarvisDate: 2010-05-26 18:26:53
Subject: Re: Random Page Cost and Planner

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group