Re: Some ideas about Vacuum

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some ideas about Vacuum
Date: 2008-01-11 21:10:17
Message-ID: 9362e74e0801111310k6266093au9fc53f2b1256dda5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry Greg , I missed to read this part before.

On Jan 9, 2008 8:40 PM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:

>
> "Markus Schiltknecht" <markus(at)bluegap(dot)ch> writes:
>
> > Hi,
> >
> > Gokulakannan Somasundaram wrote:
> >> If we can ask the Vacuum process to scan the WAL log, it can get all
> the
> >> relevant details on where it needs to go.
>
> That's an interesting thought. I think your caveats are right but with
> some
> more work it might be possible to work it out. For example if a background
> process processed the WAL and accumulated an array of possibly-dead tuples
> to
> process in batch. It would wait whenever it sees an xid which isn't yet
> past
> globalxmin, and keep accumulating until it has enough to make it
> worthwhile
> doing a pass.
>
> I think a bigger issue with this approach is that it ties all your tables
> together. You can't process one table frequently while some other table
> has
> some long-lived deleted tuples.
>

I am not able to clearly understand what you are saying here. It ties all
the tables yes. There are two options here
a) Do we really need to do Vacuum table by table? Say we read 'n' WAL
segments and accumulate the data. We should try to sort the result with
Relation name, Block num and we can go ahead with the Vacuum. In this way,
Vacuum will only work at the database level. Why do we need to process one
table frequently?
b) We can create DSMs for each table separately and Vacuum will use the WAL
information to update it. In this way, we can Vacuum table wise.

>
>
> I'm also not sure it really buys us anything over having a second
> dead-space-map data structure. The WAL is much larger and serves other
> purposes which would limit what we can do with it.

Ok. One obvious advantage is that it saves the contention over DSM for the
DML operations and Vacuum process. Since Vacuum process is going to have
much more information on what has happened in the database, it is possible
for some new structures. For example i have been thinking of changing our
current index structure in such a way, it won't hold any duplicate tuples
for different versions of data. Whenever there is a update, only the indexes
relevant to the columns changed will get updated. The Vacuum has to play the
role of changing the tid, the index tuple points to, whenever it vacuums a
older version.
It would be possible to create such structures, which can be synched
asynchronously. Another example would be Asynchronous Materialized views.

But pushing those future plans aside, don't you think this would reduce the
contention, which otherwise would be faced by the DML operations?

Thanks,
Gokul.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2008-01-11 22:53:21 Re: Transaction Snapshot Cloning
Previous Message Simon Riggs 2008-01-11 20:42:42 Re: Transaction Snapshot Cloning