Re: Explain buffers wrong counter with parallel plans

From: Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Explain buffers wrong counter with parallel plans
Date: 2018-04-20 13:06:03
Message-ID: 01952aab-33ca-36cd-e74b-ce32f3eefc84@anayrat.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I tried to understand this issue and it seems Gather node only take account of
this own buffer usage :

create unlogged table t1 (c1 int);
insert into t1 select generate_series(1,1000000);
vacuum t1;

explain (analyze,buffers,timing off,costs off) select count(*) from t1;
QUERY PLAN
------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
Buffers: shared hit=1531
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1531
-> Partial Aggregate (actual rows=1 loops=3)
Buffers: shared hit=4425
-> Parallel Seq Scan on t1 (actual rows=333333 loops=3)
Buffers: shared hit=4425

Same query without parallelism

QUERY PLAN
----------------------------------------------------
Aggregate (actual rows=1 loops=1)
Buffers: shared hit=4425
-> Seq Scan on t1 (actual rows=1000000 loops=1)
Buffers: shared hit=4425

We can notice Parallel Seq Scan and Partial Aggregate report 4425 buffers, same
for the plan without parallelism.

I put elog debug around theses lines in execParallel.c :

/* Accumulate the statistics from all workers. */
instrument = GetInstrumentationArray(instrumentation);
instrument += i * instrumentation->num_workers;
for (n = 0; n < instrumentation->num_workers; ++n)
{
elog(LOG, "worker %d - shared_blks_read: %ld - shared_blks_hit: %ld", n,
instrument[n].bufusage.shared_blks_read,instrument[n].bufusage.shared_blks_hit);
InstrAggNode(planstate->instrument, &instrument[n]);
}

And I get theses messages :

LOG: worker 0 - shared_blks_read: 0 - shared_blks_hit: 1443
LOG: worker 1 - shared_blks_read: 0 - shared_blks_hit: 1451

If you sum each worker's shared_blks_read and Gather's shared hit you get :
1443 + 1451 + 1531 = 4425 (size of t1)

It seems Gather node and Finalize Aggregate only report buffers read by the main
process. Workers seem omitted.

Same issue occurs for Gather merge :

explain (analyze,buffers,timing off,costs off)
select * from t1 where c1%10=4 order by 1;

QUERY PLAN
-----------------------------------------------------------------
Gather Merge (actual rows=100000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1549
-> Sort (actual rows=33333 loops=3)
Sort Key: c1
Sort Method: quicksort Memory: 3143kB
Buffers: shared hit=4521
-> Parallel Seq Scan on t1 (actual rows=33333 loops=3)
Filter: ((c1 % 10) = 4)
Rows Removed by Filter: 300000
Buffers: shared hit=4425

In my log :
worker 0 - shared_blks_read: 0 - shared_blks_hit: 1487
worker 1 - shared_blks_read: 0 - shared_blks_hit: 1485

1487 + 1485 + 1549 = 4521

I tried to understand how instrumentation works, but it is above my knowledge :(

Regards,

--
Adrien

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-04-20 13:11:00 Re: Toast issues with OldestXmin going backwards
Previous Message Tatsuo Ishii 2018-04-20 12:38:22 Re: Built-in connection pooling