Unusable index

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Unusable index
Date: 2018-07-03 07:03:07
Message-ID: 6834866a-7358-367a-a30d-7bbcc46c27f7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Hot experts help is highly needed:)
One of our customers is faced with the following problem:

Session 1:
create table x (i int);
begin;
insert into x values(1);
...

Session 2:
select i as id, 0 as v into t from generate_series(1, 100000) i;
create unique index idx on t (id);
explain analyze select v from t where id = 10000;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------

 Index Scan using idx on t  (cost=0.29..8.31 rows=1 width=4) (actual
time=0.062..0.064 rows=1 loops=1)
   Index Cond: (id = 10000)
-- Everything is Ok here
-- Now some magic
update t set v = v + 1 where id = 10000;
update t set v = v + 1 where id = 10000;
update t set v = v + 1 where id = 10000;
update t set v = v + 1 where id = 10000;
update t set v = v + 1 where id = 10000;
drop index idx;
create unique index idx on t (id);
explain analyze select v from t where id = 10000;
explain analyze select v from t where id = 10000;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------

 Seq Scan on t  (cost=0.00..1693.00 rows=1 width=4) (actual
time=27.557..27.558 rows=1 loops=1)

Now no backend can use this index until transaction in Session 1 is
completed.

According to the README.HOT it is more or less expected behavior:

---------------------------------------------

Practically, we prevent certain transactions from using the new index by
setting pg_index.indcheckxmin to TRUE.  Transactions are allowed to use
such an index only after pg_index.xmin is below their TransactionXmin
horizon, thereby ensuring that any incompatible rows in HOT chains are
dead to them. (pg_index.xmin will be the XID of the CREATE INDEX
transaction.  The reason for using xmin rather than a normal column is
that the regular vacuum freezing mechanism will take care of converting
xmin to FrozenTransactionId before it can wrap around.)

This means in particular that the transaction creating the index will be
unable to use the index if the transaction has old snapshots.  We
alleviate that problem somewhat by not setting indcheckxmin unless the
table actually contains HOT chains with RECENTLY_DEAD members.

---------------------------------------------

But few notes:

1. Not only transaction created the index is not able to use it. Any
other transaction will not be able to use it as well.
2. It happens even if Session1 and Session2 works with different
databases, so it can really confuse users!

Ok, isolation of databases in Postgres is separate topic.
I want to clarify why this is index is disabled for any new transactions.

The code disabling use of index is the following:
            /*
             * If the index is valid, but cannot yet be used, ignore
it; but
             * mark the plan we are generating as transient. See
             * src/backend/access/heap/README.HOT for discussion.
             */
            if (index->indcheckxmin &&
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(indexRelation->rd_indextuple->t_data),
                                       TransactionXmin))
            {
                root->glob->transientPlan = true;
                index_close(indexRelation, NoLock);
                continue;
            }

And "indcheckmin" is set here:

    /*
     * If we found any potentially broken HOT chains, mark the index as not
     * being usable until the current transaction is below the event
horizon.
     * See src/backend/access/heap/README.HOT for discussion.  Also set
this
     * if early pruning/vacuuming is enabled for the heap relation.
While it
     * might become safe to use the index earlier based on actual cleanup
     * activity and other active transactions, the test for that would
be much
     * more complex and would require some form of blocking, so keep it
simple
     * and fast by just using the current transaction.
     *
     * However, when reindexing an existing index, we should do nothing
here.
     * Any HOT chains that are broken with respect to the index must
predate
     * the index's original creation, so there is no need to change the
     * index's usability horizon.  Moreover, we *must not* try to
change the
     * index's pg_index entry while reindexing pg_index itself, and this
     * optimization nicely prevents that.  The more complex rules
needed for a
     * reindex are handled separately after this function returns.
     *
     * We also need not set indcheckxmin during a concurrent index build,
     * because we won't set indisvalid true until all transactions that
care
     * about the broken HOT chains or early pruning/vacuuming are gone.
     *
     * Therefore, this code path can only be taken during non-concurrent
     * CREATE INDEX.  Thus the fact that heap_update will set the pg_index
     * tuple's xmin doesn't matter, because that tuple was created in the
     * current transaction anyway.  That also means we don't need to worry
     * about any concurrent readers of the tuple; no other transaction
can see
     * it yet.
     */
    if ((indexInfo->ii_BrokenHotChain ||
EarlyPruningEnabled(heapRelation)) &&
        !isreindex &&
        !indexInfo->ii_Concurrent)
    {
        Oid            indexId = RelationGetRelid(indexRelation);
        Relation    pg_index;
        HeapTuple    indexTuple;
        Form_pg_index indexForm;

        pg_index = heap_open(IndexRelationId, RowExclusiveLock);

        indexTuple = SearchSysCacheCopy1(INDEXRELID,
                                         ObjectIdGetDatum(indexId));
        if (!HeapTupleIsValid(indexTuple))
            elog(ERROR, "cache lookup failed for index %u", indexId);
        indexForm = (Form_pg_index) GETSTRUCT(indexTuple);

        /* If it's a new index, indcheckxmin shouldn't be set ... */
        Assert(!indexForm->indcheckxmin);

        indexForm->indcheckxmin = true;
        CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);

        heap_freetuple(indexTuple);
        heap_close(pg_index, RowExclusiveLock);
    }

Because ii_BrokenHotChain is true. And ii_BrokenHotChain in turn is set
to true here:

                case HEAPTUPLE_RECENTLY_DEAD:
                    /*
                     * If tuple is recently deleted then we must index it
                     * anyway to preserve MVCC semantics. (Pre-existing
                     * transactions could try to use the index after we
finish
                     * building it, and may need to see such tuples.)
                     *
                     * However, if it was HOT-updated then we must only
index
                     * the live tuple at the end of the HOT-chain.
Since this
                     * breaks semantics for pre-existing snapshots,
mark the
                     * index as unusable for them.
                     *
                     * We don't count recently-dead tuples in
reltuples, even
                     * if we index them; see acquire_sample_rows().
                     */
                    if (HeapTupleIsHotUpdated(heapTuple))
                    {
                        indexIt = false;
                        /* mark the index as unsafe for old snapshots */
                        indexInfo->ii_BrokenHotChain = true;
                    }

So
1. We mark index as unsafe for old snapshots.
3. It prevents all transactions from using this index until index
creation is below the horizon.

It seems to me that we are over-conservative here...
But I am failed to understand the source of potential problems we are
trying to prevent and how this check can be done smarter.

When I just comment assignment indexInfo->ii_BrokenHotChain = true
in the code above, then everything works correctly. No surprise: old
snapshots are not seeing this newly created index at all.

I will be very please if somebody can explain me bad scenario which we
are trying to prevent here but disabling use of this index.
Also I wonder if something can be done here.
Definitely, indexes are not used to be created too often and long living
transactions are bad things in any case.
But  the fact that "hanged" transaction in one database prevents use of
index and so dramatically increases query execution time in another
database also seems to be very disappointing.

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-03 07:05:51 Re: pgsql: Clarify use of temporary tables within partition trees
Previous Message Amit Langote 2018-07-03 06:49:44 Re: pgsql: Clarify use of temporary tables within partition trees