Re: [PATCHES] Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Hannu Krosing" <hannu(at)skype(dot)net>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Including Snapshot Info with Indexes
Date: 2007-10-23 13:52:12
Message-ID: 9362e74e0710230652v2c84b070t9dfbf9bec0cdcfd1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 10/23/07, Hannu Krosing <hannu(at)skype(dot)net> wrote:
>
> Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
> Somasundaram:
>
> >
> > There are several advantages to keeping a separate visibility
> > heap:
> >
> > 1) it is usually higly compressible, at least you can throw
> > away
> > cmin/cmax quite soon, usually also FREEZE and RLE encode the
> > rest.
> >
> > 2) faster access, more tightly packed data pages.
> >
> > 3) index-only scans
> >
> > 4) superfast VACUUM FREEZE
> >
> > 5) makes VACUUM faster even for worst cases (interleaving live
> > and dead
> > tuples)
> >
> > 6) any index scan will be faster due to fetching only visible
> > rows from
> > main heap.
> >
> > if you have to store the visibility fields of all the tuples of each
> > table, then you may not be able to accomodate in the cache. Say if a
> > table is of 1 million rows, we would need 22 MB of visibility
> > space(since visibility info takes 16 bytes. I think if we have to link
> > it with say tuple-id(6 Bytes).
>
> You can keep the visibility info small, by first dropping cmin/cmax and
> then FREEZ'ing the tuples (setting xmin to special value), after that
> you can replace a lot of visibility info tuples with single RLE encoded
> tuple, which simply states, that tuples N:A to M:B are visible.

I think i am missing something here. say if we have a tuple to your
definition. Initially it has cmin/cmax and then you drop it. Is it a
in-place update? how will you reclaim that space, if it is a in-place
update?
If we set N:A to M:B are visible, then suppose some tuple in between is
deleted, then we need to write the info in a different format. Till that
whole update happens, lot of transactions will be waiting to acquire the
lock on the same visibility info block. i feel that may again lead to the
same concurrency issues.

If that 1 million row table is mostly static, the static parts will soon
> have (al lot) less than 1 bit in visibility heap.
>
> For example, after vacuum there will be just one visibility info which
> say that whole table is visible.
>
> I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
> info so it won't grow very big at all.

If the tables are static, then DSM becomes the best solution, may be if you
are storing one bit per table, then yours become the best solution.

> I think we may need to link it with indexes with one more id. i am not
> > counting that now).
>
> why ?

If we are going to store something like a range of tuples are visible, how
we will reach that particular info. Don't we need a pointer to reach that
memory block.

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
> if any ctid pointe from index points to a visible tuple we check it
> based on that ctid.

Oh then you occupy space proportional to the number of tuples. it would be
like a hash map, mapping ctids to the information. so if we have a
information like M:A to N:B are visible, then should we be placing it
against each ctid?

> If we have 10 tables, then we will have 220 MB. Keeping them pinned
> > in memory may not be advisable in some circumstances.
>
> no no! no pinning, the "mostly in cache" will happen automatically (and
> I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)

L1 and L2 data caches. hmmm. i think the basic problem with visibility is if
you make it too small, then updates get costly in terms of concurrency.

> If it is not going to be in memory, then that is no different from
> > referring a table. But i accept that is a concept worth trying out. I
> > think the advantage with thick indexes comes with the fact, that it is
> > optional. If we can make this also as optional, that would be better.
> > But if we are going to suggest it as a replacement of DSM, then it
> > loses the advantage of being small.
>
> I agree that a single-purpose DSM can be made smaller than multi-purpose
> visibility heap.
>
>
>
>
>

--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jaimelima 2007-10-23 13:53:29 relations diagram of tables in the catalog system
Previous Message Hannu Krosing 2007-10-23 13:50:22 Re: [PATCHES] Including Snapshot Info with Indexes

Browse pgsql-patches by date

  From Date Subject
Next Message Dave Page 2007-10-23 15:08:20 Win32: Minimising desktop heap usage
Previous Message Hannu Krosing 2007-10-23 13:50:22 Re: [PATCHES] Including Snapshot Info with Indexes