Re: [PATCHES] Including Snapshot Info with Indexes

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Including Snapshot Info with Indexes
Date: 2007-10-23 13:25:47
Message-ID: 1193145947.17735.33.camel@hannu-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
Somasundaram:

>
> There are several advantages to keeping a separate visibility
> heap:
>
> 1) it is usually higly compressible, at least you can throw
> away
> cmin/cmax quite soon, usually also FREEZE and RLE encode the
> rest.
>
> 2) faster access, more tightly packed data pages.
>
> 3) index-only scans
>
> 4) superfast VACUUM FREEZE
>
> 5) makes VACUUM faster even for worst cases (interleaving live
> and dead
> tuples)
>
> 6) any index scan will be faster due to fetching only visible
> rows from
> main heap.
>
> if you have to store the visibility fields of all the tuples of each
> table, then you may not be able to accomodate in the cache. Say if a
> table is of 1 million rows, we would need 22 MB of visibility
> space(since visibility info takes 16 bytes. I think if we have to link
> it with say tuple-id(6 Bytes).

You can keep the visibility info small, by first dropping cmin/cmax and
then FREEZ'ing the tuples (setting xmin to special value), after that
you can replace a lot of visibility info tuples with single RLE encoded
tuple, which simply states, that tuples N:A to M:B are visible.

If that 1 million row table is mostly static, the static parts will soon
have (al lot) less than 1 bit in visibility heap.

For example, after vacuum there will be just one visibility info which
say that whole table is visible.

I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
info so it won't grow very big at all.

> I think we may need to link it with indexes with one more id. i am not
> counting that now).

why ?

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
if any ctid pointe from index points to a visible tuple we check it
based on that ctid.

> If we have 10 tables, then we will have 220 MB. Keeping them pinned
> in memory may not be advisable in some circumstances.

no no! no pinning, the "mostly in cache" will happen automatically (and
I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)

> If it is not going to be in memory, then that is no different from
> referring a table. But i accept that is a concept worth trying out. I
> think the advantage with thick indexes comes with the fact, that it is
> optional. If we can make this also as optional, that would be better.
> But if we are going to suggest it as a replacement of DSM, then it
> loses the advantage of being small.

I agree that a single-purpose DSM can be made smaller than multi-purpose
visibility heap.

------------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-10-23 13:29:58 Re: Feature Freeze date for 8.4
Previous Message Heikki Linnakangas 2007-10-23 13:16:58 Re: [PATCHES] Including Snapshot Info with Indexes

Browse pgsql-patches by date

  From Date Subject
Next Message Hannu Krosing 2007-10-23 13:50:22 Re: [PATCHES] Including Snapshot Info with Indexes
Previous Message Heikki Linnakangas 2007-10-23 13:16:58 Re: [PATCHES] Including Snapshot Info with Indexes