Skip site navigation (1) Skip section navigation (2)

Re: Random Page Cost and Planner

From: Bryan Hinton <bryan(at)bryanhinton(dot)com>
To: David Jarvis <thangalin(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random Page Cost and Planner
Date: 2010-05-27 21:40:12
Message-ID: (view raw or whole thread)
Lists: pgsql-performance
Agree with Tom on his point about avoidance of cost param adjustments to fit
specific test cases.
A few I assume you own this database...
- check out pg_statio_user_tables - optimize your cache hit ratio on blocks
read...different time durations... pg_stat_bgwriter (read from a script or
something and snapshot)
- pg_buffercache in contrib/
- /proc/meminfo on linux
- find out exactly what is going on with your kernel buffer cache (size, how
it is buffering) and if your controller or drive is using a read ahead
- might want to play around with partial indexes vs. and/or range
partitioning with exclusion constraints, etc.
- define I/O characteristics of the dataset - taking into account index
clustering and index order on in-memory pages (i.e. re-cluster?), why need
for multiple index if clustering indexes on heap?
- solidify the referential integrity constraints between those tables, on
paper....define the use cases before modifying the database tables...i
assume this is a dev database
- linux fs mount options to explore - i.e. noatime, writeback, etc.
-maybe look at prepared statements if you are running alot of similar
queries from a single session? assuming web front end for your db - with say
frequently queried region/category/dates for large read-only dataset with
multiple join conditions?

There are some good presentations on from PGCon 2010 that was held
last week...

If you take everything into account and model it correctly (not too loose,
not too tight), your solution will be reusable and will save time and
hardware expenses.

Regards -


On Thu, May 27, 2010 at 2:43 AM, David Jarvis <thangalin(at)gmail(dot)com> wrote:

> Hi, Bryan.
> I was just about to reply to the thread, thanks for asking. Clustering was
> key. After rebooting the machine (just to make sure absolutely nothing was
> cached), I immediately ran a report on Toronto: 5.25 seconds!
> Here's what I did:
>    1. Created a new set of tables that matched the old set, with
>    statistics of 1000 on the station and taken (date) columns.
>    2. Inserted the data from the old hierarchy into the new set, ordered
>    by station id then by date (same seven child tables as before: one per
>    category).
>       - I wanted to ensure a strong correlation between primary key and
>       station id.
>       3. Added three indexes per table: (a) station id; (b) date taken;
>    and (c) station-taken-category.
>    4. Set the station-taken-category index as CLUSTER.
>    5. Vacuumed the new tables.
>    6. Dropped the old tables.
>    7. Set the following configuration values:
>       - shared_buffers = 1GB
>       - temp_buffers = 32MB
>       - work_mem = 32MB
>       - maintenance_work_mem = 64MB
>       - seq_page_cost = 1.0
>       - random_page_cost = 2.0
>       - cpu_index_tuple_cost = 0.001
>       - effective_cache_size = 512MB
> I ran a few more reports (no reboots, but reading vastly different data
> sets):
>    - Vancouver: 4.2s
>    - Yellowknife: 1.7s
>    - Montreal: 6.5s
>    - Trois-Riviers: 2.8s
> No full table scans. I imagine some indexes are not strictly necessary and
> will test to see which can be removed (my guess: the station and taken
> indexes). The problem was that the station ids were scattered and so
> PostgreSQL presumed a full table scan would be faster.
> Physically ordering the data by station ids triggers index use every time.
> Next week's hardware upgrade should halve those times -- unless anyone has
> further suggestions to squeeze more performance out of PG. ;-)
> Dave

In response to


pgsql-performance by date

Next:From: Craig JamesDate: 2010-05-27 21:44:19
Subject: Re: Does FILTER in SEQSCAN short-circuit AND?
Previous:From: Kevin GrittnerDate: 2010-05-27 21:28:48
Subject: Re: Does FILTER in SEQSCAN short-circuit AND?

Privacy Policy | About PostgreSQL
Copyright © 1996-2015 The PostgreSQL Global Development Group