Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted
Date: 2017-11-14 01:07:09
Message-ID: CAD21AoBREobJXqmtFskzRppZ5eWkV+9NViDKGRCyFS61OegYsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 14, 2017 at 3:01 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Nov 13, 2017 at 12:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Commit e2c79e14 prevented multiple cleanup process for pending list in
>> GIN index. But I think that there is still possibility that vacuum
>> could miss tuples to be deleted if someone else is cleaning up the
>> pending list.
>
> I've been suspicious of that commit (and related commits) for a while
> now [1]. I think that it explains a couple of different problem
> reports that we have seen.

Yeah, the problem here is that vacuum and analyze don't acquire a
heavy weight lock for meta page using properly function. it seems not
relevant with that problem.

>
>> In ginInsertCleanup(), we lock the GIN meta page by LockPage and could
>> wait for the concurrent cleaning up process if stats == NULL. And the
>> source code comment says that this happen is when ginINsertCleanup is
>> called by [auto]vacuum/analyze or gin_clean_pending_list(). I agree
>> with this behavior. However, looking at the callers the stats is NULL
>> only either if pending list exceeds to threshold during insertions or
>> if only analyzing is performed by an autovacum worker or ANALYZE
>> command. So I think we should inVacuum = (stats != NULL) instead.
>> Also, we might want autoanalyze and ANALYZE command to wait for
>> concurrent process as well. Attached patch fixes these two issue. If
>> this is a bug we should back-patch to 9.6.
>
> How did you figure this out? Did you just notice that the code wasn't
> doing what it claimed to do, or was there a problem that you saw in
> production?
>

I just noticed it during surveying the GIN code for the another patch[1].

[1] https://commitfest.postgresql.org/15/1133/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-11-14 01:21:47 Re: FP16 Support?
Previous Message Masahiko Sawada 2017-11-14 00:41:58 Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization