Re: Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Csaba Nagy" <nagy(at)ecircle-ag(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Including Snapshot Info with Indexes
Date: 2007-10-12 06:46:39
Message-ID: 9362e74e0710112346tc0371d3v93dd87763593fe29@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi All,
Last two mails were sent by mistake without completion. I couldn't
curse my system any further
I apologize for that.

If we comeback to the topic of discussion

So i think we are clear now, that it is possible to have an index with
snapshot info. Let me try to enumerate the uses of having the Index with
snapshot info, in comparison to the Dead Space Map.

a) Dead Space, if it is successfull in its implementation of what it claims,
will have the means to point out that all the tuples of certain chunks are
frozen for registered relations and registered chunks. There would be lot of
blocks which won't fall under this category.
i) For example, if the records are newly inserted, that block
can't be marked as containing all frozen tuples.
ii) Imagine the case where there is a batch job / Huge select
query running in a enterprise for more than 6hrs. All the blocks which have
got inserted into the tables, during this period might not be able to get
the advantage of DSM
iii) Imagine the case for which i am actually proposing the
Index with Snapshot infos. Partitioned tables. Every time a new table gets
created, it has to get registered into the Deadspace. This requires more
maintenance on the DBA Side
iv) I understand the DeadSpaceLock to be a Global lock(If one
transaction is updating the dead space for any unregistered chunk, no one
else can query the DeadSpace). If my statement is right, then partitioned
tables might not be able to benefit from DSM. We have to remember for tables
with daily partitions, this would prove to be a nightmare

Other than that there are lot of advantages, i foresee with including the
indexes with snapshots
i) Vacumming of these indexes need not be done with SuperExclusive Locks. It
is possible to design a strategy to vacuum these indexes with Exclusive
locks on pages
ii) The above would mean that index can be in operation while the vacuum is
happening
iii) As we have already stated, it provides a efficient clustering of
related data.
iv) The Reverse Mapping Index, if present provides an efficient solution to
the Retail Vacuum problem. So HOT can be improved further with no need to
place the restriction of the updated tuple should be in the same page
iv) Updates on tables with primary keys(where primary key is not updated),
will be able to resolve the unique constraint faster. This is a minor
advantage.

The complexity of Reverse Mapping index will only be there for user-defined
functional indexes.
These functions can be pruned further to find out the obvious immutable
ones.

I expect your valuable feedback for this.

Thanks,
Gokul.

On 10/12/07, Gokulakannan Somasundaram <gokul007(at)gmail(dot)com> wrote:
>
> Hi All,
> Last mail was sent by mistake without completion. I apologize for
> that. i am continuing on that.
>
> So i think we are clear now, that it is possible to have an index with
> snapshot info. Let me try to enumerate the uses of having the Index with
> snapshot info, in comparison to the Dead Space Map.
>
> a) Dead Space, if it is successfull in its implementation of what it
> claims, will have the means to point out that all the tuples of certain
> chunks are frozen for registered relations and registered chunks. There
> would be lot of blocks which won't fall under this category.
> i) For example, if the records are newly inserted, that
> block can't be marked as containing all frozen tuples.
> ii) Imagine the case where there is a batch job / Huge select
> query running in a enterprise for more than 6hrs. All the blocks which have
> got inserted into the tables, during this period might not be able to get
> the advantage of DSM
>
>
>
> On 10/11/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> >
> > Gokulakannan Somasundaram wrote:
> > > As explained, if we are going to include the snapshot with indexes,
> > Vacuum
> > > will be done on the index independent of the table, so Vacuum will not
> > > depend on immutability. We need to goto the index from the table, when
> > we
> > > want to update the snapshot info. The problem on hand is that some of
> > the
> > > userdefined functions are mutable, whereas the user might mark it
> > immutable.
> > >
> > > So my idea is to have a mapping index, with tupleid as the first
> > column and
> > > the function's values as subsequent columns. I have a somewhat
> > detailed
> > > design in mind. So there will be a over head of extra 3 I/Os for
> > > update/delete on indices based on User-defined functions. But this
> > setup
> > > will speed-up lot of queries where the tables are partitioned and
> > there will
> > > be more inserts and selects and dropping partitions at periodic
> > intervals.
> > > Updates become costly by 3 I/Os per Index with snapshot. So if someone
> > has
> > > more selects than updates+deletes then this index might come handy
> > (ofcourse
> > > not with user-defined functional indices).
> >
> > I think you need to explain why that is better than using the Dead Space
> > Map. We're going to want the DSM anyway, to speed up VACUUMs; enabling
> > index-only-scans just came as an afterthought. While DSM designed just
> > for speeding up vacuums might look slightly different than one used for
> > index-only scans, the infrastructure is roughly the same.
> >
> > What you're proposing sounds a lot more complex, less space-efficient,
> > and slower to update. It requires extra action from the DBA, and it
> > covers exactly the same use case (more selects than updates+deletes, to
> > use your words). It would require changes to all index access methods,
> > while the DSM would automatically work with all of them. In particular,
> > including visibility information in a bitmap index, should we have
> > bitmap indexes in the future, is impossible, while the DSM approach
> > would just work.
> >
> > --
> > Heikki Linnakangas
> > EnterpriseDB http://www.enterprisedb.com
> >
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-10-12 06:51:13 Re: Release notes introductory text
Previous Message Simon Riggs 2007-10-12 06:40:25 Re: First steps with 8.3 and autovacuum launcher

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-10-12 09:03:02 Re: Including Snapshot Info with Indexes
Previous Message Gokulakannan Somasundaram 2007-10-12 06:33:41 Re: Including Snapshot Info with Indexes