Re: Vacuum: allow usage of more than 1GB of work mem

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-12-27 17:04:12
Message-ID: CAGTBQpbeNRyr45HvnYDgbEKnbnxGVNpHJNeY2Hj=d3hLsxwO9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 27, 2016 at 10:54 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Anastasia Lubennikova wrote:
>
>> I ran configure using following set of flags:
>> ./configure --enable-tap-tests --enable-cassert --enable-debug
>> --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
>> And then ran make check. Here is the stacktrace:
>>
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0 0x00000000006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
>> vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
>> 1417 tblk =
>> ItemPointerGetBlockNumber(&seg->dead_tuples[tupindex]);
>
> This doesn't make sense, since the patch removes the "tupindex"
> variable in that function.

The variable is still there. It just has a slightly different meaning
(index within the current segment, rather than global index).

On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> 23.12.2016 22:54, Claudio Freire:
>
> On Fri, Dec 23, 2016 at 1:39 PM, Anastasia Lubennikova
> <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>
> I found the reason. I configure postgres with CFLAGS="-O0" and it causes
> Segfault on initdb.
> It works fine and passes tests with default configure flags, but I'm pretty
> sure that we should fix segfault before testing the feature.
> If you need it, I'll send a core dump.
>
> I just ran it with CFLAGS="-O0" and it passes all checks too:
>
> CFLAGS='-O0' ./configure --enable-debug --enable-cassert
> make clean && make -j8 && make check-world
>
> A stacktrace and a thorough description of your build environment
> would be helpful to understand why it breaks on your system.
>
>
> I ran configure using following set of flags:
> ./configure --enable-tap-tests --enable-cassert --enable-debug
> --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
> And then ran make check. Here is the stacktrace:

Same procedure runs fine on my end.

> core file is quite big, so I didn't attach it to the mail. You can download it here: core dump file.

Can you attach your binary as well? (it needs to be identical to be
able to inspect the core dump, and quite clearly my build is
different)

I'll keep looking for ways it could crash there, but being unable to
reproduce the crash is a big hindrance, so if you can send the binary
that could help speed things up.

On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> 1. prefetchBlkno = blkno & ~0x1f;
> prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
>
> I didn't get it what for we need these tricks. How does it differ from:
> prefetchBlkno = (blkno > 32) ? blkno - 32 : 0;

It makes all prefetches ranges of 32 blocks aligned to 32-block boundaries.

It helps since it's at 32 block boundaries that the truncate stops to
check for locking conflicts and abort, guaranteeing no prefetch will
be needless (if it goes into that code it *will* read the next 32
blocks).

> 2. Why do we decrease prefetchBlckno twice?
>
> Here:
> + prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
> And here:
> if (prefetchBlkno >= 32)
> + prefetchBlkno -= 32;

The first one is outside the loop, it's finding the first prefetch
range that is boundary aligned, taking care not to cause underflow.

The second one is inside the loop, it's moving the prefetch window
down as the truncate moves along. Since it's already aligned, it
doesn't need to be realigned, just clamped to zero if it happens to
reach the bottom.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2016-12-27 17:14:39 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Tom Lane 2016-12-27 15:50:27 Re: gettimeofday is at the end of its usefulness?