Re: plan with result cache is very slow when work_mem is not enough

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: plan with result cache is very slow when work_mem is not enough
Date: 2021-05-07 21:16:54
Message-ID: 6699e022-cce3-6b00-9051-2007bd5f9fa8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/7/21 11:04 PM, David Rowley wrote:
> On Sat, 8 May 2021 at 08:18, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>> pá 7. 5. 2021 v 21:56 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com> napsal:
>>> With USE_ASSERT_CHECKING builds, I did add some code that verifies the
>>> memory tracking is set correctly when evicting from the cache. This
>>> code is pretty expensive as it loops over the entire cache to check
>>> the memory accounting every time we evict something from the cache.
>>> Originally, I had this code only run when some other constant was
>>> defined, but I ended up changing it to compile that code in for all
>>> assert enabled builds.
>>>
>>> I considered that it might be too expensive as you can see from the
>>> comment in [1]. I just wanted to get a few machines other than my own
>>> to verify that the memory accounting code was working as expected.
>>> There have been no complaints of any Assert failures yet, so maybe
>>> it's safe to consider either removing the code entirely or just having
>>> it run when some other more specific to purpose constant is defined.
>>> If we did the latter, then I'd have concerns that nothing would ever
>>> run the code to check the memory accounting, that's why I ended up
>>> changing it to run with USE_ASSERT_CHECKING builds.
>>
>>
>> I understand. I think this is too slow for generic assertions, because the overhead is about 50x.
>
> I didn't expect it would show up quite that much. If you scaled the
> test up a bit more and increased work_mem further, then it would be
> even more than 50x.
>
> At one point when I was developing the patch, I had two high water
> marks for cache memory. When we reached the upper of the two marks,
> I'd reduce the memory down to the lower of two marks. The lower of
> the two marks was set to 98% of the higher mark. In the end, I got
> rid of that as I didn't really see what extra overhead there was from
> just running the eviction code every time we require another byte.
> However, if we did have that again, then the memory checking could
> just be done when we run the eviction code. We'd then need to consume
> that 2% more memory before it would run again.
>
> My current thinking is that I don't really want to add that complexity
> just for some Assert code. I'd only want to do it if there was another
> valid reason to.
>

Agreed. I think this approach to eviction (i.e. evicting more than you
need) would be useful if the actual eviction code was expensive, and
doing it in a "batch" would make it significantly cheaper. But I don't
think "asserts are expensive" is a good reason for it.

> Another thought I have is that maybe it would be ok just to move
> memory accounting debug code so it only runs once in
> ExecEndResultCache. I struggling to imagine that if the memory
> tracking did go out of whack, that the problem would have accidentally
> fixed itself by the time we got to ExecEndResultCache(). I guess even
> if the accounting was counting far too much memory and we'd evicted
> everything from the cache to try and get the memory usage down, we'd
> still find the problem during ExecEndResultCache(), even if the cache
> had become completely empty as a result.
>

I don't think postponing the debug code until much later is a great
idea. When something goes wrong it's good to know ASAP, otherwise it's
much more difficult to identify the issue.

Not sure we need to do something here - for regression tests this is not
an issue, because those generally work with small data sets. And if you
run with asserts on large amounts of data, I think this is acceptable.

I had the same dilemma with the new BRIN index opclasses, which also
have some extensive and expensive assert checks - for the regression
tests that's fine, and it proved very useful during development.

I have considered enabling those extra checks only on request somehow,
but I'd bet no one would do that and I'd forget it exists pretty soon.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2021-05-07 22:33:19 Re: Processing btree walks as a batch to parallelize IO
Previous Message James Coleman 2021-05-07 21:14:52 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays