Re: Parallel Seq Scan vs kernel read ahead

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan vs kernel read ahead
Date: 2020-06-11 04:43:05
Message-ID: CAApHDvrYam+btP5QuVY_6-kiaUb3OHt13m5EXgVEXDmOz=_1=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 11 Jun 2020 at 16:03, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I think something on these lines would be a good idea especially
> keeping step-size proportional to relation size. However, I am not
> completely sure if doubling the step-size with equal increase in
> relation size (ex. what is happening between 16MB~8192MB) is the best
> idea. Why not double the step-size when relation size increases by
> four times? Will some more tests help us to identify this? I also
> don't know what is the right answer here so just trying to brainstorm.

Brainstorming sounds good. I'm by no means under any illusion that the
formula is correct.

But, why four times? The way I did it tries to keep the number of
chunks roughly the same each time. I think the key is the number of
chunks more than the size of the chunks. Having fewer chunks increases
the chances of an imbalance of work between workers, and with what you
mention, the number of chunks will vary more than what I have proposed

The code I showed above will produce something between 512-1024 chunks
for all cases until we 2^20 pages, then we start capping the chunk
size to 1024. I could probably get onboard with making it depend on
the number of parallel workers, but perhaps it would be better just to
divide by, say, 16384 rather than 1024, as I proposed above. That way
we'll be more fine-grained, but we'll still read in larger than 1024
chunk sizes when the relation gets beyond 128GB.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dong Wook Lee 2020-06-11 05:25:37 Add tap test for --extra-float-digits option
Previous Message Amit Kapila 2020-06-11 04:03:17 Re: Parallel Seq Scan vs kernel read ahead