Re: Random Page Cost and Planner

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: David Jarvis <thangalin(at)gmail(dot)com>
Cc: Bryan Hinton <bryan(at)bryanhinton(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Random Page Cost and Planner
Date: 2010-05-27 08:03:09
Message-ID: AANLkTimtdk11hrWd-OpvXGLrh-4pukGd9eHEUemEM3OR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2010/5/27 David Jarvis <thangalin(at)gmail(dot)com>:
> Hi, Bryan.
>
> I was just about to reply to the thread, thanks for asking. Clustering was
> key. After rebooting the machine (just to make sure absolutely nothing was
> cached), I immediately ran a report on Toronto: 5.25 seconds!
>
> Here's what I did:
>
> Created a new set of tables that matched the old set, with statistics of
> 1000 on the station and taken (date) columns.
> Inserted the data from the old hierarchy into the new set, ordered by
> station id then by date (same seven child tables as before: one per
> category).
>
> I wanted to ensure a strong correlation between primary key and station id.
>
> Added three indexes per table: (a) station id; (b) date taken; and (c)
> station-taken-category.
> Set the station-taken-category index as CLUSTER.
> Vacuumed the new tables.
> Dropped the old tables.
> Set the following configuration values:
>
> shared_buffers = 1GB
> temp_buffers = 32MB
> work_mem = 32MB
> maintenance_work_mem = 64MB
> seq_page_cost = 1.0
> random_page_cost = 2.0
> cpu_index_tuple_cost = 0.001
> effective_cache_size = 512MB
>
> I ran a few more reports (no reboots, but reading vastly different data
> sets):
>
> Vancouver: 4.2s
> Yellowknife: 1.7s
> Montreal: 6.5s
> Trois-Riviers: 2.8s
>
> No full table scans. I imagine some indexes are not strictly necessary and
> will test to see which can be removed (my guess: the station and taken
> indexes). The problem was that the station ids were scattered and so
> PostgreSQL presumed a full table scan would be faster.
>
> Physically ordering the data by station ids triggers index use every time.
>
> Next week's hardware upgrade should halve those times -- unless anyone has
> further suggestions to squeeze more performance out of PG. ;-)

I wonder what the plan will be if you replace sc.taken_* in :
m.taken BETWEEN sc.taken_start AND sc.taken_end
by values. It might help the planner...

Also, I'll consider explicit ordered join but I admit I haven't read
the whole thread (in particular the table size).
Ho, and I set statistics to a highter value for column category_id,
table station_category (seeing the same resquest and explain analyze
without date in the query will help)

>
> Dave
>
>

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Konrad Garus 2010-05-27 08:59:50 Re: shared_buffers advice
Previous Message Cédric Villemain 2010-05-27 07:50:48 Re: shared_buffers advice