Re: Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Including Snapshot Info with Indexes
Date: 2007-10-09 07:25:47
Message-ID: 9362e74e0710090025s27a960acvd05495a3d31ec1e6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 10/8/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Gokulakannan Somasundaram wrote:
> > I am always slightly late in understanding things. Let me
> try
> > to understand the use of DSM. It is a bitmap index on whether all the
> tuples
> > in a particular block is visible to all the backends, whether a
> particular
> > block contains tuples which are invisible to everyone. But i think this
> will
> > get subjected to the same limitations of Bitmap index. Even Oracle
> suggests
> > the use of Bitmap index for only data warehousing tables, where the
> Bitmap
> > indexes will be dropped and recreated after every bulk load. This is not
> a
> > viable alternative for OLTP transactions.
>
> Well, it's not quite the same as a bitmap index, though both use a
> bitmap. You didn't quite get into details on what the limitations are
> and why it wouldn't be suitable for OLTP, but I don't see any
> significant problems.
>
> > But i think i am late in the game
> > as i haven't participated in those discussions
>
> Better late than never :).
>
> > One Bitmap index block usually maps to lot of blocks in the
> heap.
> > So locking of one page to update the DSM for update/delete/insert would
> hit
> > the concurrency. But again all these are my observation w.r.t oracle
> bitmap
> > indexes. May be i am missing something in DSM.
>
> Yeah, the DSM page could become a contention bottleneck. My current
> thinking is that we'd have a flag in the heap page header, that would be
> set together with the bit in the DSM. When the flag in the page header
> is set, you don't need to lock and update the DSM because you know the
> bit is already set. Vacuum would have to clear both the DSM bit and the
> flag.

It matters to us, where the index scan will goto. If the Index Scan is going
to touch DSM for understanding visibility(This might degrade the performance
of some of the index scans, if they have to wait to acquire the share lock,
and learn that they have to goto the heap to understand their visibility
requirements.) In the mean while, if the vacuum, inserts/updates/deletes are
holding the BUFFER_EXCLUSIVE lock on that, this would hurt the Select
transactions. Since there is only one bit per block in the DSM(best case),
there might be one DSM block per 8000 table blocks. All the transactions
which are accessing the 8000 blocks will be waiting on this one DSM block.
If we are going to update the Heap page header and asking the Indexscan to
refer to that, then there is no reduction in random I/Os. Can't we say that
if the snapshot info is embedded with index, we can avoid all these
difficulties? Most importantly it won't affect the performance of current
postgres in any way.

> Let's take up Retail Vacuuming again. The User defined function
> > which would return different values at different time can be classified
> as
> > non-deterministic functions. We can say that this index cannot be
> created
> > on a non-deterministic function. This is the way it is implemented in
> > Oracle. What they have done is they have classified certain built-in
> > operators and functions as deterministic. Similarly they have classified
> a
> > few as non-deterministic operators and functions. Can we follow a
> similar
> > approach?
>
> We already do. A function must be marked as IMMUTABLE in order to use it
> in an index expression. But we can't enforce that the user defined
> function really behaves like an immutable function should. If someone
> creates a user-defined function in C that calls the C random() function,
> we can't stop it.

A function is said to be deterministic, if it returns the same value,
irrespective of how many times, it is invoked. I think this definition
clearly puts the random function under the non-deterministic category. If we
have such a classification, do you think we can resolve this issue?

As I said earlier, using an index like that will of course lead to bogus
> results. But it won't currently cause any server crashes or more serious
> corruption.

One more final word on unique indexes. Whenever we are doing an update,
there will be insertions into the unique indexes which will trigger table
lookups. Ofcourse there is more probability, that the table block would be
in memory(un-pinned). Still contention for a shared resource is avoided, if
the snapshot info is stored with the indexes.

Let me get one more clarification, what would be type of performance results
with this implementation, that would encourage the hackers community to
accept the extra maintenance overhead.

Thanks,
Gokul.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2007-10-09 07:41:00 Re: Including Snapshot Info with Indexes
Previous Message Simon Riggs 2007-10-09 07:20:45 Re: [COMMITTERS] pgsql: Added the Skytools extended transaction ID module to contrib as

Browse pgsql-patches by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2007-10-09 07:41:00 Re: Including Snapshot Info with Indexes
Previous Message Magnus Hagander 2007-10-09 06:23:54 Re: Preliminary patch for tsearch example dictionaries/pa rsers in contrib