Quick Links

RE: Parallel Seq Scan vs kernel read ahead

From:	"k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: Parallel Seq Scan vs kernel read ahead
Date:	2020-07-21 09:38:46
Message-ID:	OSBPR01MB2341378F27AD79873177F853EF780@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tuesday, July 21, 2020 12:18 PM, Amit Kapila wrote:
> On Tue, Jul 21, 2020 at 8:06 AM k(dot)jamison(at)fujitsu(dot)com <k(dot)jamison(at)fujitsu(dot)com>
> wrote:
> >
> > Thank you for the advice. I repeated the test as per your advice and
> > average of 3 runs per worker/s planned.
> > It still shows the following similar performance results between Master and
> Patch V2.
> > I wonder why there's no difference though.
> >
> > The test on my machine is roughly like this:
> >
> > createdb test
> > psql -d test
> > create table t_heap as select generate_series(1, 100000000) i; \q
> >
> > pg_ctl restart
> > psql -d test
> > SET track_io_timing = on;
> > SET max_parallel_workers_per_gather = 0; SHOW
> > max_parallel_workers_per_gather; EXPLAIN (ANALYZE, BUFFERS) SELECT
> > count(*) from t_heap; \timing SELECT count(*) from t_heap;
> >
> > drop table t_heap;
> > \q
> > dropdb test
> > pg_ctl restart
> >
> > Below are the results. Again, almost no discernible difference between the
> master and patch.
> > Also, the results when max_parallel_workers_per_gather is more than 4
> > could be inaccurate due to my machine's limitation of only having v4
> > CPUs. Even so, query planner capped it at
> > 6 workers.
> >
> > Query Planner I/O Timings (track_io_timing = on) in ms :
> > | Worker | I/O READ (Master) | I/O READ (Patch) | I/O WRITE (Master) |
> > | I/O WRITE (Patch) |
> >
> |--------|-------------------|------------------|--------------------|-------------
> ------|
> > | 0 | "1,130.777" | "1,250.821" | "01,698.051" |
> "01,733.439" |
> > | 1 | "1,603.016" | "1,660.767" | "02,312.248" |
> "02,291.661" |
> > | 2 | "2,036.269" | "2,107.066" | "02,698.216" |
> "02,796.893" |
> > | 3 | "2,298.811" | "2,307.254" | "05,695.991" |
> "05,894.183" |
> > | 4 | "2,098.642" | "2,135.960" | "23,837.088" |
> "26,537.158" |
> > | 5 | "1,956.536" | "1,997.464" | "45,891.851" |
> "48,049.338" |
> > | 6 | "2,201.816" | "2,219.001" | "61,937.828" |
> "67,809.486" |
> >
> > Query Planner Execution Time (ms):
> > | Worker | QueryPlanner (Master) | QueryPlanner (Patch) |
> > |--------|-----------------------|----------------------|
> > | 0.000 | "40,454.252" | "40,521.578" |
> > | 1.000 | "21,332.067" | "21,205.068" |
> > | 2.000 | "14,266.756" | "14,385.539" |
> > | 3.000 | "11,597.936" | "11,722.055" |
> > | 4.000 | "12,937.468" | "13,439.247" |
> > | 5.000 | "14,383.083" | "14,782.866" |
> > | 6.000 | "14,671.336" | "15,507.581" |
> >
> > Based from the results above, the I/O latency increases as number of
> > workers also increase. Despite that, the query planner execution time
> > is almost closely same when 2 or more workers are used (14~11s). Same
> results between Master and Patch V2.
> >
> > As for buffers, same results are shown per worker (both Master and Patch).
> > | Worker | Buffers |
> > |--------|--------------------------------------------------|
> > | 0 | shared read=442478 dirtied=442478 written=442446 |
> > | 1 | shared read=442478 dirtied=442478 written=442414 |
> > | 2 | shared read=442478 dirtied=442478 written=442382 |
> > | 3 | shared read=442478 dirtied=442478 written=442350 |
> > | 4 | shared read=442478 dirtied=442478 written=442318 |
> > | 5 | shared read=442478 dirtied=442478 written=442286 |
> > | 6 | shared read=442478 dirtied=442478 written=442254 |
> >
> >
> > SQL Query Execution Time (ms) :
> > | Worker | SQL (Master) | SQL (Patch) |
> > |--------|--------------|--------------|
> > | 0 | "10,418.606" | "10,377.377" |
> > | 1 | "05,427.460" | "05,402.727" |
> > | 2 | "03,662.998" | "03,650.277" |
> > | 3 | "02,718.837" | "02,692.871" |
> > | 4 | "02,759.802" | "02,693.370" |
> > | 5 | "02,761.834" | "02,682.590" |
> > | 6 | "02,711.434" | "02,726.332" |
> >
> > The SQL query execution time definitely benefitted from previous run
> > of query planner, so the results are faster. But again, both Master and Patched
> have almost the same results.
> > Nonetheless, the execution time is almost consistent when
> > max_parallel_workers_per_gather is 2 (default) and above.
> >
> > I am definitely missing something. Perhaps I think I could not
> > understand why there's no I/O difference between the Master and
> > Patched (V2). Or has it been already improved even without this patch?
> >
>
> I don't think it is strange that you are not seeing much difference because as per
> the initial email by Thomas this patch is not supposed to give benefits on all
> systems. I think we wanted to check that the patch should not regress
> performance in cases where it doesn't give benefits. I think it might be okay to
> run with a higher number of workers than you have CPUs in the system as we
> wanted to check if such cases regress as shown by Soumyadeep above [1]. Can
> you once try with
> 8 and or 10 workers as well?
>
> [1] -
> https://www.postgresql.org/message-id/CADwEdoqirzK3H8bB%3DxyJ192EZCN
> wGfcCa_WJ5GHVM7Sv8oenuA%40mail.gmail.com

You are right. Kindly excuse me on that part, which only means it may or may not have any
benefits on the filesystem I am using. But for other fs, as we can see from David's benchmarks
significant results/benefits.

Following your advice on regression test case, I increased the number of workers,
but the query planner still capped the workers at 6, so the results from 6 workers onwards
are almost the same.
I don't see significant difference between master and patched on my machine
as per my test results below. (Just for reconfirmation)

Query Planner I/O Timings (ms):
| Worker | I/O READ (Master) | I/O READ (Patch) | I/O WRITE (Master) | I/O WRITE (Patch) |
|--------|-------------------|------------------|--------------------|-------------------|
| 0 | "1,130.78" | "1,250.82" | "1,698.05" | "1,733.44" |
| 1 | "1,603.02" | "1,660.77" | "2,312.25" | "2,291.66" |
| 2 | "2,036.27" | "2,107.07" | "2,698.22" | "2,796.89" |
| 3 | "2,298.81" | "2,307.25" | "5,695.99" | "5,894.18" |
| 4 | "2,098.64" | "2,135.96" | "23,837.09" | "26,537.16" |
| 5 | "1,956.54" | "1,997.46" | "45,891.85" | "48,049.34" |
| 6 | "2,201.82" | "2,219.00" | "61,937.83" | "67,809.49" |
| 8 | "2,117.80" | "2,169.67" | "60,671.22" | "68,676.36" |
| 16 | "2,052.73" | "2,134.86" | "60,635.17" | "66,462.82" |
| 32 | "2,036.00" | "2,200.98" | "60,833.92" | "67,702.49" |

Query Planner Execution Time (ms):
| Worker | QueryPlanner (Master) | QueryPlanner (Patch) |
|--------|-----------------------|----------------------|
| 0 | "40,454.25" | "40,521.58" |
| 1 | "21,332.07" | "21,205.07" |
| 2 | "14,266.76" | "14,385.54" |
| 3 | "11,597.94" | "11,722.06" |
| 4 | "12,937.47" | "13,439.25" |
| 5 | "14,383.08" | "14,782.87" |
| 6 | "14,671.34" | "15,507.58" |
| 8 | "14,679.50" | "15,615.69" |
| 16 | "14,474.78" | "15,274.61" |
| 32 | "14,462.11" | "15,470.68" |

| Worker | Buffers |
|--------|--------------------------------------------------|
| 0 | shared read=442478 dirtied=442478 written=442446 |
| 1 | shared read=442478 dirtied=442478 written=442414 |
| 2 | shared read=442478 dirtied=442478 written=442382 |
| 3 | shared read=442478 dirtied=442478 written=442350 |
| 4 | shared read=442478 dirtied=442478 written=442318 |
| 5 | shared read=442478 dirtied=442478 written=442286 |
| 6 | shared read=442478 dirtied=442478 written=442254 |
| 8 | shared read=442478 dirtied=442478 written=442254 |
| 16 | shared read=442478 dirtied=442478 written=442254 |
| 32 | shared read=442478 dirtied=442478 written=442254 |

I also re-ran the query and measured the execution time (ms) with \timing
| Worker | SQL (Master) | SQL (Patch) |
|--------|--------------|-------------|
| 0 | 15476.458 | 15278.772 |
| 1 | 8292.702 | 8426.435 |
| 2 | 6256.673 | 6232.456 |
| 3 | 6357.217 | 6340.013 |
| 4 | 7591.311 | 7913.881 |
| 5 | 8165.315 | 8070.592 |
| 6 | 8065.578 | 8200.076 |
| 8 | 7988.302 | 8609.138 |
| 16 | 8025.170 | 8469.895 |
| 32 | 8019.393 | 8645.150 |

Again tested on:
XFS filesystem, HDD virtual machine, 8GB RAM
RHEL4, 64-bit,
4 CPUs, Intel Core Processor (Haswell, IBRS)
PostgreSQL 14devel on x86_64-pc-linux-gnu

So I guess it does not affect the filesystem that I am using. So I think it's OK.

Kind regards,
Kirk Jamison

In response to

Re: Parallel Seq Scan vs kernel read ahead at 2020-07-21 03:18:16 from Amit Kapila

Responses

Re: Parallel Seq Scan vs kernel read ahead at 2020-07-21 10:32:30 from Amit Kapila
Re: Parallel Seq Scan vs kernel read ahead at 2020-07-21 23:55:17 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2020-07-21 10:24:14	Re: Parallel copy
Previous Message	Amit Khandekar	2020-07-21 09:16:18	Re: Auto-vectorization speeds up multiplication of large-precision numerics