Re: [PoC] pgstattuple2: block sampling to reduce physical read

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Smith <greg(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] pgstattuple2: block sampling to reduce physical read
Date: 2013-10-10 22:09:34
Message-ID: 5257259E.2070103@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/09/13 16:20, Satoshi Nagayasu wrote:
> (2013/09/15 11:07), Peter Eisentraut wrote:
>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>>> I'm looking forward to seeing more feedback on this approach,
>>> in terms of design and performance improvement.
>>> So, I have submitted this for the next CF.
>>
>> Your patch fails to build:
>>
>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>> this function)
>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>> only once for each function it appears in
>
> Thanks for checking. Fixed to eliminate SnapshotNow.
>

This seems like a cool idea! I took a quick look, and initally
replicated the sort of improvement you saw:

bench=# explain analyze select * from pgstattuple('pgbench_accounts');
QUERY PLAN

--------------------------------------------------------------------------------
Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
time=786.368..786.369 rows=1 loops=1)
Total runtime: 786.384 ms
(2 rows)

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN

--------------------------------------------------------------------------------
Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
time=12.004..12.005 rows=1 loops=1)
Total runtime: 12.019 ms
(2 rows)

I wondered what sort of difference eliminating caching would make:

$ sudo sysctl -w vm.drop_caches=3

Repeating the above queries:

bench=# explain analyze select * from pgstattuple('pgbench_accounts');
QUERY PLAN

--------------------------------------------------------------------------------
Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
time=9503.774..9503.776 rows=1 loops=1)
Total runtime: 9504.523 ms
(2 rows)

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN

--------------------------------------------------------------------------------
Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
time=12330.630..12330.631 rows=1 loops=1)
Total runtime: 12331.353 ms
(2 rows)

So the sampling code seems *slower* when the cache is completely cold -
is that expected? (I have not looked at how the code works yet - I'll
dive in later if I get a chance)!

Regards

Mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-10-10 22:16:49 Re: ECPG FETCH readahead
Previous Message Alvaro Herrera 2013-10-10 21:34:19 Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls