Re: Parallel Seq Scan vs kernel read ahead

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan vs kernel read ahead
Date: 2020-06-11 11:34:58
Message-ID: CAA4eK1L+2u=-EpGjyhdC2NqdsU7ASP5_PpC3azTnchmttrYRYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 11, 2020 at 10:13 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Thu, 11 Jun 2020 at 16:03, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I think something on these lines would be a good idea especially
> > keeping step-size proportional to relation size. However, I am not
> > completely sure if doubling the step-size with equal increase in
> > relation size (ex. what is happening between 16MB~8192MB) is the best
> > idea. Why not double the step-size when relation size increases by
> > four times? Will some more tests help us to identify this? I also
> > don't know what is the right answer here so just trying to brainstorm.
>
> Brainstorming sounds good. I'm by no means under any illusion that the
> formula is correct.
>
> But, why four times?
>

Just trying to see if we can optimize such that we use bigger
step-size for bigger relations and smaller step-size for smaller
relations.

> The way I did it tries to keep the number of
> chunks roughly the same each time. I think the key is the number of
> chunks more than the size of the chunks. Having fewer chunks increases
> the chances of an imbalance of work between workers, and with what you
> mention, the number of chunks will vary more than what I have proposed
>

But, I think it will lead to more number of chunks for smaller relations.

> The code I showed above will produce something between 512-1024 chunks
> for all cases until we 2^20 pages, then we start capping the chunk
> size to 1024. I could probably get onboard with making it depend on
> the number of parallel workers, but perhaps it would be better just to
> divide by, say, 16384 rather than 1024, as I proposed above. That way
> we'll be more fine-grained, but we'll still read in larger than 1024
> chunk sizes when the relation gets beyond 128GB.
>

I think increasing step-size might be okay for very large relations.

Another point I am thinking is that whatever formula we come up here
might not be a good fit for every case. For ex. as you mentioned
above that larger step-size can impact the performance based on
qualification, similarly there could be other things like having a
target list or qual having some function which takes more time for
certain tuples and lesser for others especially if function evaluation
is based on some column values. So, can we think of providing a
rel_option for step-size?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-11 12:45:42 jsonpath versus NaN
Previous Message Joe Conway 2020-06-11 11:33:10 Re: Recording test runtimes with the buildfarm