Re: Weird indices

From: Ian Lance Taylor <ian(at)airs(dot)com>
To: Joseph Shraibman <jks(at)selectacast(dot)net>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Weird indices
Date: 2001-02-21 01:42:52
Message-ID: sin1bgke03.fsf@daffy.airs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Joseph Shraibman <jks(at)selectacast(dot)net> writes:

> > > I understand that keeping different views for different open
> > > transactions can be difficult, but after a transaction that updates a
> > > row is over why isn't the row marked as 'universally visible' for all
> > > new transactions until another update occurs?
> >
> > It is. This mark is on the tuple in the heap. When a tuple is
> > current, and not locked for update, HEAP_XMAX_INVALID is set. After
> > the tuple is removed, HEAP_XMAX_COMMITTED is set.
>
> On the heap, but when is the index updated? Not until the next vacuum?

The index is updated right away. Otherwise it could never be used,
since it would be inaccurate.

When a new tuple is inserted in the heap, the index is updated to
point to the new tuple. This involves inserting new tuples into the
index. The old tuples in the index are left untouched, and continue
to point to the old tuple in the heap.

The old tuples in the index, and the heap, are removed by VACUUM.
VACUUM walks through the index and checks the heap tuple corresponding
to each index tuple. If the heap tuple is gone, or can no longer be
seen by any transaction, then the index tuple can be removed.

Note that this all implies that when walking through the index to find
heap tuples, you must check the current validity of each heap tuple.
It is normal for an index tuple to point to a heap tuple which has
been deleted.

> > > Maybe this is part of the whole 'vacuum later' vs. 'update now'
> > > philosophy. If the point of vacuum later is to put off the performance
> > > hit until later if it is causing these performance hits on queries
> > > because index scans aren't being used then doesn't that mean 'update
> > > now' is more likely to pay off in the short run?
> >
> > I don't follow. A simple VACUUM doesn't update the statistics.
> > VACUUM ANALYZE has to do more work.
>
> I'm talking about indices. The index should be updated to only point at
> valid rows.

When should the index be updated to only point at valid rows? That is
only possible when a heap tuple is finally and completely removed.
But currently that is only known at the time of a VACUUM.

Consider a transaction which sits around for a while and then aborts.
At the moment that the transaction aborts, Postgres may become able to
remove some heap tuples and some index tuples, and it might be invalid
for Postgres to remove those tuples before the transaction aborts.

But the transaction might never have looked at those tuples. So,
given the Postgres data structures, the only way to keep an index
fully up to date would be to effectively run a VACUUM over the entire
database every time a transaction aborts.

OK, that's not quite true. It would be possible to keep a list in
shared memory of tuples which were recently dropped. Then as
transactions dropped out, it would be possible to see which ones could
be completely removed. This list of tuples would presumably include
index tuples.

> But if the index isn't used by the planner then the point
> is moot.

As far as I know the index itself isn't used by the planner.

> > Are you suggesting that the statistics should be updated
> > continuously? I guess that would be doable, but it would clearly
> > slow down the database. For some applications, it would be an
> > obviously bad idea.
>
> No, I'm suggesting that indices should be updated continuously so the
> planner can use them without having a performance hit from checking if
> tuples are valid or not.

Well, as far as I know, the planner doesn't use the index. It only
uses the statistics.

> BTW is there any way to tell postgres to do an update at every commit
> without waiting for a vacuum? I understand that the postgres core team
> thinks it is a bad idea, but there are ways to (sort of) force using
> index scans whent he planner doesn't want to, so is there something
> similar to force incremental vacuuming at the end of each query?
>
> Java has this option:
> -Xincgc enable incremental garbage collection
>
> that isn't recommended, but the java developers recognized that
> sometimes a user might want it anyway for whatever reason.

I don't think there is any way to do that today. It would be possible
to implement something along the lines I suggest above. I have no
idea if the Postgres maintainers have any plans along these lines.

Ian

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephan Szabo 2001-02-21 02:05:18 Re: Weird indices
Previous Message Joseph Shraibman 2001-02-21 01:26:37 Re: Weird indices