Re: Too many duplicated condition query return wrong value

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Atsushi Yoshida <rudeboyjet(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Too many duplicated condition query return wrong value
Date: 2015-09-06 22:47:17
Message-ID: CAMkU=1yRpVwb2PTpYEOMQ37_ye9tXx1s8BZ75L5P55y=bMSuXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 3, 2015 at 10:55 PM, Atsushi Yoshida <rudeboyjet(at)gmail(dot)com>
wrote:

> >> Can you give an "explain (analyze, buffers)" for each query? Maybe
> you have a corrupted index, and one query uses the index and the other does
> not.
>
>
> >
> > Index Scan using idx_attend_00 on attend (cost=0.29..627.20 rows=172
> width=12) (actual time=5.158..10.179 rows=5 loops=1)
> > Index Cond: (sid = 325)
> > Filter: (lid = ANY ('{ABF0010,ABF0010,ABF0010,ABF0010,ABF0010 ...
> ABF0060,ABF0060,ABF0060,ABF0060}'::text[]))
> > Rows Removed by Filter: 414
>

...

> >
> > Index Scan using index_attend_on_sid_and_lid on attend
> (cost=0.42..36.32 rows=3 width=12) (actual time=0.011..0.034 rows=6 loops=1)
> > Index Cond: ((sid = 325) AND (lid = ANY
> ('{ABF0010,ABF0020,ABF0030,ABF0040,ABF0050,ABF0060}'::text[])))
> > Buffers: shared hit=24
>
>

> Is this result aims idx_attend_00 corrupted?
> How to fix it?
> What countermeasure do I it?
>

Yes, almost certainly. You can fix it by rebuilding the index ("REINDEX
INDEX idx_attend_00"). Whether this will completely fix the problem
depends on what caused it. There could be a wider corruption issue of
which this is only a symptom.

If you would like to preserve the ability to investigate the root cause,
you should make a full file-level backup of the database *before* doing the
re-index.

What type of index is it? (I'm now guessing btree, but maybe not)? Is
there a defined time window during which you know the corruption occurred?
If so, do you still have the server logs from that time window? The WAL
logs?

Do you know if the sometimes-missing tuple actually belongs in the table or
not? It could be that the row was marked deleted from the table, scrubbed
from the index, and then inappropriately got "revived" like a zombie in the
table, so that the "corrupt" index is correct and it is the table that is
wrong.

And of course, if you are running with fsync=off or full_page_writes=off,
don't do that.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-09-07 02:04:35 Getting total and free disk space from paths in PGDATA
Previous Message Andres Freund 2015-09-06 21:39:58 Re: checkpointer continuous flushing