Re: SegFault on 9.6.14

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SegFault on 9.6.14
Date: 2019-10-17 05:20:52
Message-ID: CA+hUKGLDyNHRF551inm02d3ZP=z7K_kd0fpyii3K=Ve5gTTJoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 13, 2019 at 1:35 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Robert, Thomas, do you have any more suggestions related to this. I
> > am planning to commit the above-discussed patch (Forbid Limit node to
> > shutdown resources.) coming Monday, so that at least the reported
> > problem got fixed.
>
> I think that your commit message isn't very clear about what the
> actual issue is. And the patch itself doesn't add any comments or
> anything to try to clear it up. So I wouldn't favor committing it in
> this form.

Is the proposed commit message at the bottom of this email an improvement?

Do I understand correctly that, with this patch, we can only actually
lose statistics in the case where we rescan? That is, precisely the
case that crashes (9.6) or spews warnings (10+)? In a quick
non-rescan test with the ExecShutdownNode() removed, I don't see any
problem with the buffer numbers on my screen:

postgres=# explain (analyze, buffers, timing off, costs off) select
count(*) from t limit 50000;
QUERY PLAN
------------------------------------------------------------------------------
Limit (actual rows=1 loops=1)
Buffers: shared hit=16210 read=28038
-> Finalize Aggregate (actual rows=1 loops=1)
Buffers: shared hit=16210 read=28038
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16210 read=28038
-> Partial Aggregate (actual rows=1 loops=3)
Buffers: shared hit=16210 read=28038
-> Parallel Seq Scan on t (actual rows=3333333 loops=3)
Buffers: shared hit=16210 read=28038
Planning Time: 0.086 ms
Execution Time: 436.669 ms
(14 rows)

===
Don't shut down Gather[Merge] early under Limit.

Revert part of commit 19df1702f5.

Early shutdown was added by that commit so that we could collect
statistics from workers, but unfortunately it interacted badly with
rescans. Rescanning a Limit over a Gather node could produce a SEGV
on 9.6 and resource leak warnings on later releases. By reverting the
early shutdown code, we might lose statistics in some cases of Limit
over Gather, but that will require further study to fix.

Author: Amit Kapila, testcase by Vignesh C
Reported-by: Jerry Sievers
Diagnosed-by: Thomas Munro
Backpatch-through: 9.6
Discussion: https://postgr.es/m/87ims2amh6.fsf@jsievers.enova.com
===

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asif Rehman 2019-10-17 05:21:15 Re: WIP/PoC for parallel backup
Previous Message Amit Kapila 2019-10-17 05:10:32 Re: maintenance_work_mem used by Vacuum