Re: [PoC] pgstattuple2: block sampling to reduce physical read

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Smith <greg(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] pgstattuple2: block sampling to reduce physical read
Date: 2013-10-10 22:32:13
Message-ID: 52572AED.3030609@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/10/13 11:09, Mark Kirkwood wrote:
> On 16/09/13 16:20, Satoshi Nagayasu wrote:
>> (2013/09/15 11:07), Peter Eisentraut wrote:
>>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>>>> I'm looking forward to seeing more feedback on this approach,
>>>> in terms of design and performance improvement.
>>>> So, I have submitted this for the next CF.
>>> Your patch fails to build:
>>>
>>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>>> this function)
>>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>>> only once for each function it appears in
>> Thanks for checking. Fixed to eliminate SnapshotNow.
>>
> This seems like a cool idea! I took a quick look, and initally
> replicated the sort of improvement you saw:
>
>
> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
> time=786.368..786.369 rows=1 loops=1)
> Total runtime: 786.384 ms
> (2 rows)
>
> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
> time=12.004..12.005 rows=1 loops=1)
> Total runtime: 12.019 ms
> (2 rows)
>
>
>
> I wondered what sort of difference eliminating caching would make:
>
> $ sudo sysctl -w vm.drop_caches=3
>
> Repeating the above queries:
>
>
> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
> time=9503.774..9503.776 rows=1 loops=1)
> Total runtime: 9504.523 ms
> (2 rows)
>
> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
> time=12330.630..12330.631 rows=1 loops=1)
> Total runtime: 12331.353 ms
> (2 rows)
>
>
> So the sampling code seems *slower* when the cache is completely cold -
> is that expected? (I have not looked at how the code works yet - I'll
> dive in later if I get a chance)!
>

Quietly replying to myself - looking at the code the sampler does 3000
random page reads... I guess this is slower than 163935 (number of pages
in pgbench_accounts) sequential page reads thanks to os readahead on my
type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
the sample size) down helps - but obviously that can impact estimation
accuracy.

Thinking about this a bit more, I guess the elapsed runtime is not the
*only* theng to consider - the sampling code will cause way less
disruption to the os page cache (3000 pages vs possibly lots more than
3000 for reading an entire ralation).

Thoughts?

Cheers

Mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-10-10 22:36:17 Re: Auto-tuning work_mem and maintenance_work_mem
Previous Message Bruce Momjian 2013-10-10 22:28:06 Re: Auto-tuning work_mem and maintenance_work_mem