Re: Parallel Seq Scan vs kernel read ahead

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan vs kernel read ahead
Date: 2020-07-21 03:18:16
Message-ID: CAA4eK1+Cu=2qHP0LWaB4jZB2tf7GJmruxapoPXv72UZmdVvQcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 21, 2020 at 8:06 AM k(dot)jamison(at)fujitsu(dot)com
<k(dot)jamison(at)fujitsu(dot)com> wrote:
>
> Thank you for the advice. I repeated the test as per your advice and average of 3 runs
> per worker/s planned.
> It still shows the following similar performance results between Master and Patch V2.
> I wonder why there's no difference though.
>
> The test on my machine is roughly like this:
>
> createdb test
> psql -d test
> create table t_heap as select generate_series(1, 100000000) i;
> \q
>
> pg_ctl restart
> psql -d test
> SET track_io_timing = on;
> SET max_parallel_workers_per_gather = 0;
> SHOW max_parallel_workers_per_gather;
> EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) from t_heap;
> \timing
> SELECT count(*) from t_heap;
>
> drop table t_heap;
> \q
> dropdb test
> pg_ctl restart
>
> Below are the results. Again, almost no discernible difference between the master and patch.
> Also, the results when max_parallel_workers_per_gather is more than 4 could be inaccurate
> due to my machine's limitation of only having v4 CPUs. Even so, query planner capped it at
> 6 workers.
>
> Query Planner I/O Timings (track_io_timing = on) in ms :
> | Worker | I/O READ (Master) | I/O READ (Patch) | I/O WRITE (Master) | I/O WRITE (Patch) |
> |--------|-------------------|------------------|--------------------|-------------------|
> | 0 | "1,130.777" | "1,250.821" | "01,698.051" | "01,733.439" |
> | 1 | "1,603.016" | "1,660.767" | "02,312.248" | "02,291.661" |
> | 2 | "2,036.269" | "2,107.066" | "02,698.216" | "02,796.893" |
> | 3 | "2,298.811" | "2,307.254" | "05,695.991" | "05,894.183" |
> | 4 | "2,098.642" | "2,135.960" | "23,837.088" | "26,537.158" |
> | 5 | "1,956.536" | "1,997.464" | "45,891.851" | "48,049.338" |
> | 6 | "2,201.816" | "2,219.001" | "61,937.828" | "67,809.486" |
>
> Query Planner Execution Time (ms):
> | Worker | QueryPlanner (Master) | QueryPlanner (Patch) |
> |--------|-----------------------|----------------------|
> | 0.000 | "40,454.252" | "40,521.578" |
> | 1.000 | "21,332.067" | "21,205.068" |
> | 2.000 | "14,266.756" | "14,385.539" |
> | 3.000 | "11,597.936" | "11,722.055" |
> | 4.000 | "12,937.468" | "13,439.247" |
> | 5.000 | "14,383.083" | "14,782.866" |
> | 6.000 | "14,671.336" | "15,507.581" |
>
> Based from the results above, the I/O latency increases as number of workers
> also increase. Despite that, the query planner execution time is almost closely same
> when 2 or more workers are used (14~11s). Same results between Master and Patch V2.
>
> As for buffers, same results are shown per worker (both Master and Patch).
> | Worker | Buffers |
> |--------|--------------------------------------------------|
> | 0 | shared read=442478 dirtied=442478 written=442446 |
> | 1 | shared read=442478 dirtied=442478 written=442414 |
> | 2 | shared read=442478 dirtied=442478 written=442382 |
> | 3 | shared read=442478 dirtied=442478 written=442350 |
> | 4 | shared read=442478 dirtied=442478 written=442318 |
> | 5 | shared read=442478 dirtied=442478 written=442286 |
> | 6 | shared read=442478 dirtied=442478 written=442254 |
>
>
> SQL Query Execution Time (ms) :
> | Worker | SQL (Master) | SQL (Patch) |
> |--------|--------------|--------------|
> | 0 | "10,418.606" | "10,377.377" |
> | 1 | "05,427.460" | "05,402.727" |
> | 2 | "03,662.998" | "03,650.277" |
> | 3 | "02,718.837" | "02,692.871" |
> | 4 | "02,759.802" | "02,693.370" |
> | 5 | "02,761.834" | "02,682.590" |
> | 6 | "02,711.434" | "02,726.332" |
>
> The SQL query execution time definitely benefitted from previous run of query planner,
> so the results are faster. But again, both Master and Patched have almost the same results.
> Nonetheless, the execution time is almost consistent when
> max_parallel_workers_per_gather is 2 (default) and above.
>
> I am definitely missing something. Perhaps I think I could not understand why there's no
> I/O difference between the Master and Patched (V2). Or has it been already improved
> even without this patch?
>

I don't think it is strange that you are not seeing much difference
because as per the initial email by Thomas this patch is not supposed
to give benefits on all systems. I think we wanted to check that the
patch should not regress performance in cases where it doesn't give
benefits. I think it might be okay to run with a higher number of
workers than you have CPUs in the system as we wanted to check if such
cases regress as shown by Soumyadeep above [1]. Can you once try with
8 and or 10 workers as well?

[1] - https://www.postgresql.org/message-id/CADwEdoqirzK3H8bB%3DxyJ192EZCNwGfcCa_WJ5GHVM7Sv8oenuA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-07-21 03:21:46 Re: Added tab completion for the missing options in copy statement
Previous Message Andrey M. Borodin 2020-07-21 02:53:59 Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING