Re: Some ideas about Vacuum

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some ideas about Vacuum
Date: 2008-01-16 08:25:36
Message-ID: 9362e74e0801160025s5415caeeq9599d6fbaa7563f8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Please find my answers inline

>
> Do you have evidence of that contention being so worse, that it
> justifies the additional WAL reading from disk? (Assuming no WAL
> archiving).

On a broader sense, DSM is a bitmap index with some optimization that has
been placed to make the updates more effective. As you may know, the design
of Bitmap index doesn't scale very well with concurrency. If you put more
information into a little space, then i feel it might affect concurrency.
Let us discuss it in detail.
DSM, i believe plans to achieve the following objectives,
a) To find out the blocks, which are to be Vacuumed
b) To find out the blocks, where freezing is required
c) To find out the blocks which are visible to everyone.

The DSM might get split into multiple maps like Visibility maps(already
proposed by Heikki), Vacuum Maps and Freezing maps. When the inserts
happen, the map has to get extended and it has to lock the block to extend
the map. Say if the DSM block corresponds to some 60K data blocks. Then any
updates / deletes happening over those blocks have to wait for that time.
This is just an example, which i can think of off-hand. May be the people,
who are implementing might throw more light on the synchronization points.

>
> IMO we can get about any granularity we want for DSM update locking,
> depending on how we arrange the DSM bits.

I can't understand this exactly.

> > Since Vacuum process is going to
> > have much more information on what has happened in the database,
>
> Why should that be? IMO, collecting the information at transaction time
> can give you exactly the same information, if not more or better
> information.

My argument is if we have collected that information in WAL, why should we
collect it again and again?

> > it is
> > possible for some new structures. For example i have been thinking of
> > changing our current index structure in such a way, it won't hold any
> > duplicate tuples for different versions of data. Whenever there is a
> > update, only the indexes relevant to the columns changed will get
> > updated. The Vacuum has to play the role of changing the tid, the index
> > tuple points to, whenever it vacuums a older version.
>
> Huh? The index would then point to the old tuple only, until a VACUUM
> comes by, right. How are following transactions expected to find the new
> tuple before that VACUUMing?

You are right. We have already discusses about this. In the Vacuum aproach,
we travel front in time. We catch the oldest transaction and go to the new
transaction, by following the ctid in the old tuple. In the undo log
approach, it is the reverse. We go to the latest transaction and travel back
in time. Its interesting to see, how theory of relativity has got applied in
database science right?

So say we have 'n' versions of the same data in index. Right now we have 'n'
index tuples which point to 'n' block in heap. we would read all the 'n'
index tuples and go to all the versions of data in the table. If this
changes, there will be one index tuple, which would point to the oldest heap
tuple and from there we will navigate to all the new tuples. The advantage
is obvious, the index is going to have lesser size and the updates will not
update indexes, unless the data in it has got changed.

Hope i was clear. Please revert back, in case i am not clear.

Thanks,
Gokul.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2008-01-16 08:27:34 Re: Some ideas about Vacuum
Previous Message D'Arcy J.M. Cain 2008-01-16 07:29:56 Re: Password policy