Re: Parallel Seq Scan vs kernel read ahead

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan vs kernel read ahead
Date: 2020-06-19 02:10:17
Message-ID: CAApHDvq+mXCDE61qEWHLBCOVxHQMaF1S_Z8vhU_KsvhAowg+5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 19 Jun 2020 at 11:34, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 19 Jun 2020 at 03:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 18, 2020 at 6:15 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > > With a 32TB relation, the code will make the chunk size 16GB. Perhaps
> > > I should change the code to cap that at 1GB.
> >
> > It seems pretty hard to believe there's any significant advantage to a
> > chunk size >1GB, so I would be in favor of that change.
>
> I could certainly make that change. With the standard page size, 1GB
> is 131072 pages and a power of 2. That would change for non-standard
> page sizes, so we'd need to decide if we want to keep the chunk size a
> power of 2, or just cap it exactly at whatever number of pages 1GB is.
>
> I'm not sure how much of a difference it'll make, but I also just want
> to note that synchronous scans can mean we'll start the scan anywhere
> within the table, so capping to 1GB does not mean we read an entire
> extent. It's more likely to span 2 extents.

Here's a patch which caps the maximum chunk size to 131072. If
someone doubles the page size then that'll be 2GB instead of 1GB. I'm
not personally worried about that.

I tested the performance on a Windows 10 laptop using the test case from [1]

Master:

workers=0: Time: 141175.935 ms (02:21.176)
workers=1: Time: 316854.538 ms (05:16.855)
workers=2: Time: 323471.791 ms (05:23.472)
workers=3: Time: 321637.945 ms (05:21.638)
workers=4: Time: 308689.599 ms (05:08.690)
workers=5: Time: 289014.709 ms (04:49.015)
workers=6: Time: 267785.270 ms (04:27.785)
workers=7: Time: 248735.817 ms (04:08.736)

Patched:

workers=0: Time: 155985.204 ms (02:35.985)
workers=1: Time: 112238.741 ms (01:52.239)
workers=2: Time: 105861.813 ms (01:45.862)
workers=3: Time: 91874.311 ms (01:31.874)
workers=4: Time: 92538.646 ms (01:32.539)
workers=5: Time: 93012.902 ms (01:33.013)
workers=6: Time: 94269.076 ms (01:34.269)
workers=7: Time: 90858.458 ms (01:30.858)

David

[1] https://www.postgresql.org/message-id/CAApHDvrfJfYH51_WY-iQqPw8yGR4fDoTxAQKqn%2BSa7NTKEVWtg%40mail.gmail.com

Attachment Content-Type Size
bigger_io_chunks_for_parallel_seqscan_v2.patch application/x-patch 10.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-06-19 02:20:01 Re: Missing HashAgg EXPLAIN ANALYZE details for parallel plans
Previous Message David Rowley 2020-06-19 02:02:29 Re: Missing HashAgg EXPLAIN ANALYZE details for parallel plans