Re: Random Page Cost and Planner

From: Bryan Hinton <bryan(at)bryanhinton(dot)com>
To: David Jarvis <thangalin(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random Page Cost and Planner
Date: 2010-05-27 21:40:12
Message-ID: AANLkTilLahnsIGdmBYjs7Q7uuCspoRXRg6cEGA18PDPx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Agree with Tom on his point about avoidance of cost param adjustments to fit
specific test cases.
A few suggestions...as I assume you own this database...
- check out pg_statio_user_tables - optimize your cache hit ratio on blocks
read...different time durations... pg_stat_bgwriter (read from a script or
something and snapshot)
- pg_buffercache in contrib/
- /proc/meminfo on linux
- find out exactly what is going on with your kernel buffer cache (size, how
it is buffering) and if your controller or drive is using a read ahead
cache.
- might want to play around with partial indexes vs. and/or range
partitioning with exclusion constraints, etc.
- define I/O characteristics of the dataset - taking into account index
clustering and index order on in-memory pages (i.e. re-cluster?), why need
for multiple index if clustering indexes on heap?
- solidify the referential integrity constraints between those tables, on
paper....define the use cases before modifying the database tables...i
assume this is a dev database
- linux fs mount options to explore - i.e. noatime, writeback, etc.
-maybe look at prepared statements if you are running alot of similar
queries from a single session? assuming web front end for your db - with say
frequently queried region/category/dates for large read-only dataset with
multiple join conditions?

There are some good presentations on pgcon.org from PGCon 2010 that was held
last week...
http://www.pgcon.org/2010/schedule/events/218.en.html

If you take everything into account and model it correctly (not too loose,
not too tight), your solution will be reusable and will save time and
hardware expenses.

Regards -

Bryan

On Thu, May 27, 2010 at 2:43 AM, David Jarvis <thangalin(at)gmail(dot)com> wrote:

> Hi, Bryan.
>
> I was just about to reply to the thread, thanks for asking. Clustering was
> key. After rebooting the machine (just to make sure absolutely nothing was
> cached), I immediately ran a report on Toronto: 5.25 seconds!
>
> Here's what I did:
>
> 1. Created a new set of tables that matched the old set, with
> statistics of 1000 on the station and taken (date) columns.
> 2. Inserted the data from the old hierarchy into the new set, ordered
> by station id then by date (same seven child tables as before: one per
> category).
> - I wanted to ensure a strong correlation between primary key and
> station id.
> 3. Added three indexes per table: (a) station id; (b) date taken;
> and (c) station-taken-category.
> 4. Set the station-taken-category index as CLUSTER.
> 5. Vacuumed the new tables.
> 6. Dropped the old tables.
> 7. Set the following configuration values:
> - shared_buffers = 1GB
> - temp_buffers = 32MB
> - work_mem = 32MB
> - maintenance_work_mem = 64MB
> - seq_page_cost = 1.0
> - random_page_cost = 2.0
> - cpu_index_tuple_cost = 0.001
> - effective_cache_size = 512MB
>
> I ran a few more reports (no reboots, but reading vastly different data
> sets):
>
> - Vancouver: 4.2s
> - Yellowknife: 1.7s
> - Montreal: 6.5s
> - Trois-Riviers: 2.8s
>
> No full table scans. I imagine some indexes are not strictly necessary and
> will test to see which can be removed (my guess: the station and taken
> indexes). The problem was that the station ids were scattered and so
> PostgreSQL presumed a full table scan would be faster.
>
> Physically ordering the data by station ids triggers index use every time.
>
> Next week's hardware upgrade should halve those times -- unless anyone has
> further suggestions to squeeze more performance out of PG. ;-)
>
> Dave
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2010-05-27 21:44:19 Re: Does FILTER in SEQSCAN short-circuit AND?
Previous Message Kevin Grittner 2010-05-27 21:28:48 Re: Does FILTER in SEQSCAN short-circuit AND?