Question about caching on full table scans

From: Markus Innerebner <markus(dot)innerebner(at)inf(dot)unibz(dot)it>
To: pgsql-performance(at)postgresql(dot)org
Subject: Question about caching on full table scans
Date: 2012-08-30 08:13:39
Message-ID: 133F8194-E4E9-49A8-A4A0-90495F175A38@inf.unibz.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello PG Performance group,

I am doing some runtime experiments in my implementation, which is computing multi-modal range queries for a query point (if you want to know details check the website: www.isochrones.inf.unibz.it).
The network is explored using Dijkstra Shortest Path algorithm that starts from the query point and starts to explore all connected vertices until the time is exceeded.
The network is stored on a postgres (postgis) database consisting of vertices and edges.
relation: edge(id int, source int, target int, length double, segment geometry,..)

I have different implementations how the network is loaded in main memory:
approach 1: loads initially the entire edge table (full table scan) in main memory and then starts to expand the network and doing some computation.
approach 2: loads only the adjacent edges of the current expanded vertex
approach 3: loads junks using the euclidean distance upper bound

I have different datasets: 6000 tuples (small), 4,000,000 tuples (large)

I repeat each experiment at least 10 times.
When observing the runtime I realized following:
- in the first iteration approach 1 takes long time, and its runtime starts to perform better after each iteration:
e.g. with large dataset
- iteration 1: 60.0s
- iteration 2: 40.7s
- iteration 3: 40,s
- iteration 4: 39.7s
- iteration 5: 39.5s
- iteration 6: 39.3s
- iteration 7: 40.0s
- iteration 8: 34.8s
- iteration 9: 39.1s
- iteration 10: 38.0s

In the other approaches I do not see that big difference.

I know that postgres (and OS) is caching that dataset. But is there a way to force the database to remove that values from the cache?
I also tried to perform after each iteration a scan on a dummy table (executing it at least 10 times to force the optimized to keep that dummy data in main memory).
But I do not see any difference.

I thing the comparison is not right fair, if the caching in the main memory approach brings that big advantage.

What can you as experts suggest me?

Cheers Markus

****************************
My environment is:

OS: linux ubuntu

CPU dual Core
model name : Intel(R) Xeon(R) CPU E7- 2850 @ 2.00GHz
stepping : 1
cpu MHz : 1997.386
cache size : 24576 KB

RAM: 5GB

postgres settings: version 8.4

shared_buffers = 650MB
work_mem = 512MB
maintenance_work_mem = 256MB
effective_cache_size = 500MB

--
Ph D. Student Markus Innerebner

DIS Research Group - Faculty of Computer Science
Free University Bozen-Bolzano

Dominikanerplatz 3 - Room 211
I - 39100 Bozen
Phone: +39-0471-016143
Mobile: +39-333-9392929

gpg --fingerprint
-------------------------------------
pub 1024D/588F6308 2007-01-09
Key fingerprint = 6948 947E CBD2 89FD E773 E863 914F EB1B 588F 6308
sub 2048g/BF4877D0 2007-01-09

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Liron Shiri 2012-08-30 08:34:48 Re: Investigating the reason for a very big TOAST table size
Previous Message Daniel Farina 2012-08-30 08:10:30 Re: Investigating the reason for a very big TOAST table size