Re: Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Including Snapshot Info with Indexes
Date: 2007-10-09 07:51:38
Message-ID: 9362e74e0710090051ldeeaf8by9423f70484dee6cc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 10/9/07, Gokulakannan Somasundaram <gokul007(at)gmail(dot)com> wrote:
>
>
>
> On 10/8/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> >
> > Gokulakannan Somasundaram wrote:
> > > I am always slightly late in understanding things. Let me
> > try
> > > to understand the use of DSM. It is a bitmap index on whether all the
> > tuples
> > > in a particular block is visible to all the backends, whether a
> > particular
> > > block contains tuples which are invisible to everyone. But i think
> > this will
> > > get subjected to the same limitations of Bitmap index. Even Oracle
> > suggests
> > > the use of Bitmap index for only data warehousing tables, where the
> > Bitmap
> > > indexes will be dropped and recreated after every bulk load. This is
> > not a
> > > viable alternative for OLTP transactions.
> >
> > Well, it's not quite the same as a bitmap index, though both use a
> > bitmap. You didn't quite get into details on what the limitations are
> > and why it wouldn't be suitable for OLTP, but I don't see any
> > significant problems.
> >
> > > But i think i am late in the game
> > > as i haven't participated in those discussions
> >
> > Better late than never :).
> >
> > > One Bitmap index block usually maps to lot of blocks in the
> > heap.
> > > So locking of one page to update the DSM for update/delete/insert
> > would hit
> > > the concurrency. But again all these are my observation w.r.t oracle
> > bitmap
> > > indexes. May be i am missing something in DSM.
> >
> > Yeah, the DSM page could become a contention bottleneck. My current
> > thinking is that we'd have a flag in the heap page header, that would be
> >
> > set together with the bit in the DSM. When the flag in the page header
> > is set, you don't need to lock and update the DSM because you know the
> > bit is already set. Vacuum would have to clear both the DSM bit and the
> > flag.
>
>
> It matters to us, where the index scan will goto. If the Index Scan is
> going to touch DSM for understanding visibility(This might degrade the
> performance of some of the index scans, if they have to wait to acquire the
> share lock, and learn that they have to goto the heap to understand their
> visibility requirements.) In the mean while, if the vacuum,
> inserts/updates/deletes are holding the BUFFER_EXCLUSIVE lock on that, this
> would hurt the Select transactions. Since there is only one bit per block in
> the DSM(best case), there might be one DSM block per 8000 table blocks. All
> the transactions which are accessing the 8000 blocks will be waiting on this
> one DSM block. If we are going to update the Heap page header and asking
> the Indexscan to refer to that, then there is no reduction in random I/Os.
> Can't we say that if the snapshot info is embedded with index, we can avoid
> all these difficulties? Most importantly it won't affect the performance of
> current postgres in any way.
>
> > Let's take up Retail Vacuuming again. The User defined
> > function
> > > which would return different values at different time can be
> > classified as
> > > non-deterministic functions. We can say that this index cannot be
> > created
> > > on a non-deterministic function. This is the way it is implemented in
> > > Oracle. What they have done is they have classified certain built-in
> > > operators and functions as deterministic. Similarly they have
> > classified a
> > > few as non-deterministic operators and functions. Can we follow a
> > similar
> > > approach?
> >
> > We already do. A function must be marked as IMMUTABLE in order to use it
> > in an index expression. But we can't enforce that the user defined
> > function really behaves like an immutable function should. If someone
> > creates a user-defined function in C that calls the C random() function,
> > we can't stop it.
>
>
> A function is said to be deterministic, if it returns the same value,
> irrespective of how many times, it is invoked. I think this definition
> clearly puts the random function under the non-deterministic category. If we
> have such a classification, do you think we can resolve this issue?
>

If we frame a set of guidelines/test procedure, do you think it might solve
the issue? Even, if we don't allow this type of indexing to anything other
than built-in deterministic functions, i feel it would serve most of the
indexing requirements.

As I said earlier, using an index like that will of course lead to bogus
> > results. But it won't currently cause any server crashes or more serious
> >
> > corruption.
>
>
>
> One more final word on unique indexes. Whenever we are doing an update,
> there will be insertions into the unique indexes which will trigger table
> lookups. Ofcourse there is more probability, that the table block would be
> in memory(un-pinned). Still contention for a shared resource is avoided, if
> the snapshot info is stored with the indexes.
>
> Let me get one more clarification, what would be type of performance
> results with this implementation, that would encourage the hackers community
> to accept the extra maintenance overhead.
>
> Thanks,
> Gokul.
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2007-10-09 08:00:51 Re: Latest ecpg patch broke MSVC build
Previous Message Simon Riggs 2007-10-09 07:44:21 Re: PG on NFS may be just a bad idea

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2007-10-09 08:15:06 Re: Including Snapshot Info with Indexes
Previous Message Gokulakannan Somasundaram 2007-10-09 07:41:00 Re: Including Snapshot Info with Indexes