Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Date: 2018-02-09 12:20:32
Message-ID: CAGTBQpZ2c5YhuSw3LY+Y0M2wxbkxhzQmq3VcqbN=X=VMixoD+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 8, 2018 at 8:39 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Claudio Freire wrote:
>
>> I don't like looping, though, seems overly cumbersome. What's worse?
>> maintaining that fragile weird loop that might break (by causing
>> unexpected output), or a slight slowdown of the test suite?
>>
>> I don't know how long it might take on slow machines, but in my
>> machine, which isn't a great machine, while the vacuum test isn't fast
>> indeed, it's just a tiny fraction of what a simple "make check" takes.
>> So it's not a huge slowdown in any case.
>
> Well, what about a machine running tests under valgrind, or the weird
> cache-clobbering infuriatingly slow code? Or buildfarm members running
> on really slow hardware? These days, a few people have spent a lot of
> time trying to reduce the total test time, and it'd be bad to lose back
> the improvements for no good reason.

It's not for no good reason. The old tests were woefully inadequate.

During the process of developing the patch, I got seriously broken
code that passed the tests nonetheless. The test as it was was very
ineffective at actually detecting issues.

This new test may be slow, but it's effective. That's a very good
reason to make it slower, if you ask me.

> I grant you that the looping I proposed is more complicated, but I don't
> see any reason why it would break.
>
> Another argument against the LOCK pg_class idea is that it causes an
> unnecessary contention point across the whole parallel test group --
> with possible weird side effects. How about a deadlock?

The real issue with lock pg_class is that locks on pg_class are
short-lived, so I'm not waiting for whole transactions.

> Other than the wait loop I proposed, I think we can make a couple of
> very simple improvements to this test case to avoid a slowdown:
>
> 1. the DELETE takes about 1/4th of the time and removes about the same
> number of rows as the one using the IN clause:
> delete from vactst where random() < 3.0 / 4;

I did try this at first, but it causes random output, so the test
breaks randomly.

> 2. Use a new temp table rather than vactst. Everything is then faster.

I might try that.

> 3. Figure out the minimum size for the table that triggers the behavior
> you want. Right now you use 400k tuples -- maybe 100k are sufficient?
> Don't know.

For that test, I need enough *dead* tuples to cause several passes.
Even small mwm settings require tons of tuples for this. In fact, I'm
thinking that number might be too low for its purpose, even. I'll
re-check, but I doubt it's too high. If anything, it's too low.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-02-09 12:34:39 Wait event names mismatch: oldserxid
Previous Message Etsuro Fujita 2018-02-09 12:20:03 Re: non-bulk inserts and tuple routing