Re: Index corruption / planner issue with one table in my pg 11.6 instance

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jerry Sievers <gsievers19(at)comcast(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, Michael Paquier <michael(at)paquier(dot)xyz>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index corruption / planner issue with one table in my pg 11.6 instance
Date: 2019-12-10 14:25:18
Message-ID: CAMa1XUjHNMMtMF=ZtvSb4uF1y=SXu+V9AmEWT+FtFfFs3SLrTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 10, 2019 at 12:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Yeah. The reported behavior can mostly be explained if we assume
> that there's some HOT chain in the table that involves an update
> of this particular column, so that if we build an index on that
> column we see a broken HOT chain, but building an index on some
> other column doesn't have a problem.
>

The problem exists so far as I can tell on indexing *any column* of *this
particular table*. I tried same experiment on another table in the same
replication stream, and I cannot reproduce it.

I am building the index **non-concurrently** every time.

> The thing this doesn't easily explain is that the behavior persists
> across repeated index rebuilds. A broken HOT chain is only broken
> as long as the older entry is still visible-to-somebody, so that
> such situations ought to be self-healing as time passes. If it
> fails repeatedly, this theory requires assuming that either
>
> 1. You've got some extremely old open transactions (maybe forgotten
> prepared transactions?), or
>

No prepared_xacts and no transactions older than a few hours. Several hour
transactions are common in this reporting system. I have not yet seen if
after several hours the index starts showing up in plans.

> 2. Your workload is constantly generating new broken HOT chains of
> the same sort, so that there's usually a live one when you try
> to build an index.
>
> The fact that you even notice the indcheckxmin restriction indicates
> that you do tend to have long-running transactions in the system,
> else the index would come free for use fairly quickly. So #1 isn't
> as implausible as I might otherwise think. But #2 seems probably
> more likely on the whole. OTOH, neither point is exactly within
> the offered evidence.
>

Is there a way for me to test this theory? I tried the following with no
change in behavior:

1. Disable write load to table
2. Vacuum analyze table (not vac full)
3. Create index
4. Explain

Still did not pick up the index.

Thanks,
Jeremy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asif Rehman 2019-12-10 14:33:48 Re: WIP/PoC for parallel backup
Previous Message Pavel Stehule 2019-12-10 13:47:03 Re: proposal: minscale, rtrim, btrim functions for numeric