Quick Links

Re: Random Page Cost and Planner

From:	David Jarvis <thangalin(at)gmail(dot)com>
To:	Bryan Hinton <bryan(at)bryanhinton(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Random Page Cost and Planner
Date:	2010-05-27 07:43:10
Message-ID:	AANLkTikjkEvj2hG-D1CMu49PQXXbxHVzpXm9JbLWzMQo@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi, Bryan.

I was just about to reply to the thread, thanks for asking. Clustering was
key. After rebooting the machine (just to make sure absolutely nothing was
cached), I immediately ran a report on Toronto: 5.25 seconds!

Here's what I did:

1. Created a new set of tables that matched the old set, with statistics
of 1000 on the station and taken (date) columns.
2. Inserted the data from the old hierarchy into the new set, ordered by
station id then by date (same seven child tables as before: one per
category).
- I wanted to ensure a strong correlation between primary key and
station id.
3. Added three indexes per table: (a) station id; (b) date taken; and
(c) station-taken-category.
4. Set the station-taken-category index as CLUSTER.
5. Vacuumed the new tables.
6. Dropped the old tables.
7. Set the following configuration values:
- shared_buffers = 1GB
- temp_buffers = 32MB
- work_mem = 32MB
- maintenance_work_mem = 64MB
- seq_page_cost = 1.0
- random_page_cost = 2.0
- cpu_index_tuple_cost = 0.001
- effective_cache_size = 512MB

I ran a few more reports (no reboots, but reading vastly different data
sets):

- Vancouver: 4.2s
- Yellowknife: 1.7s
- Montreal: 6.5s
- Trois-Riviers: 2.8s

No full table scans. I imagine some indexes are not strictly necessary and
will test to see which can be removed (my guess: the station and taken
indexes). The problem was that the station ids were scattered and so
PostgreSQL presumed a full table scan would be faster.

Physically ordering the data by station ids triggers index use every time.

Next week's hardware upgrade should halve those times -- unless anyone has
further suggestions to squeeze more performance out of PG. ;-)

Dave

In response to

Random Page Cost and Planner at 2010-05-25 05:54:11 from David Jarvis

Responses

Re: Random Page Cost and Planner at 2010-05-27 08:03:09 from Cédric Villemain
Re: Random Page Cost and Planner at 2010-05-27 21:40:12 from Bryan Hinton

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Cédric Villemain	2010-05-27 07:50:48	Re: shared_buffers advice
Previous Message	黄永卫	2010-05-27 07:27:41	About Tom Lane's Xeon CS test case