Re: Parallel Seq Scan vs kernel read ahead

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan vs kernel read ahead
Date: 2020-06-16 17:14:11
Message-ID: CA+TgmoZ-zE=XsHFnwiK5ZMnGv6WvW+oJnRXTeL=p16X0=nrDeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 16, 2020 at 6:57 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I agree that won't be a common scenario but apart from that also I am
> not sure if we can conclude that the proposed patch won't cause any
> regressions. See one of the tests [1] done by Soumyadeep where the
> patch has caused regression in one of the cases, now we can either try
> to improve the patch and see we didn't cause any regressions or assume
> that those are some minority cases which we don't care. Another point
> is that this thread has started with a theory that this idea can give
> benefits on certain filesystems and AFAICS we have tested it on one
> other type of system, so not sure if that is sufficient.

Yeah, it seems like those cases might need some more investigation,
but they're also not necessarily an argument for a configuration
setting. It's not so much that I dislike the idea of being able to
configure something here; it's really that I don't want a reloption
that feels like magic. For example, we know that work_mem can be
really hard to configure because there may be no value that's high
enough to make your queries run fast during normal periods but low
enough to avoid running out of memory during busy periods. That kind
of thing sucks, and we should avoid creating more such cases.

One problem here is that the best value might depend not only on the
relation but on the individual query. A GUC could be changed
per-query, but different tables in the query might need different
values. Changing a reloption requires locking, and you wouldn't want
to have to keep changing it for each different query. Now if we figure
out that something is hardware-dependent -- like we come up with a
good formula that adjusts the value automatically most of the time,
but say it needs to more more on SSDs than on spinning disks or the
other way around, well then that's a good candidate for some kind of
setting, maybe a tablespace option. But if it seems to depend on the
query, we need a better idea, not a user-configurable setting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-16 17:24:56 Re: Infinities in type numeric
Previous Message Tomas Vondra 2020-06-16 16:54:49 Re: hashagg slowdown due to spill changes