Re: really lazy vacuums?

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: really lazy vacuums?
Date: 2011-03-17 20:02:58
Message-ID: 4D8268F2.7010303@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-03-17 15:02, Robert Haas wrote:
> On Thu, Mar 17, 2011 at 4:17 AM, Jesper Krogh<jesper(at)krogh(dot)cc> wrote:
>> Is it obvious that the visibillity map bits should track complete
>> pages and not individual tuples? If the visibillity map tracks at
>> page-level the benefit would fall on "slim tables" where you squeeze
>> 200 tuples into each page and having an update rate of 1% would
>> lower the likelyhood even more. (it may be that for slim tables the
>> index-only-scans are not as benefitial as to wide tables).
> I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but
> say it's 200. If you track visibility info per tuple rather than per
> page, then the size of the visibility map is going to expand by a
> factor of 200. That might decrease contention, but otherwise it's a
> bad thing - the whole point of having the visibility map in the first
> place is that it's much, much smaller than the heap. If it were the
> same size as the heap, we could just read the heap. What the map
> attempts to accomplish is to allow us, by reading a small number of
> pages, to check whether the tuples we're thinking of reading are
> likely to be all-visible without actually looking at them.
Yes, that was sort of the math I was trying to make. I do allthough
belive that you have a way better feeling about it. But according
to this:
http://wiki.postgresql.org/wiki/FAQ#How_much_database_disk_space_is_required_to_store_data_from_a_typical_text_file.3F

The bulk row-overhead is around 24bytes, which will with 1 bit per row
give a
size reduction of 1:(24x8) ~1:192, worstcase... that gives at best 341
tuples/page
(where each tuple, does not contain any data at all). With that ratio, the
visibillitymap of a relation of 10GB would fill 52MB on disk (still
worst case)
and that by itself would by all means be awesome. (with that small tuples a
10GB relation would have around 42 billion tuples).

On the 1 bit per page the "best case" would be 341 times better than above
reducing the size of the visibiility map on a 10GB table to around 152KB
which
is extremely small (and thus also awesome) But the consequenses of a single
update would mean that you loose visibilllity map benefit on 341 tuples in
one shot.

Worst case situations are, where we approach the 4 tuples per page, before
we hit toast where the ratio of space reduction in 1 bit per tuple would
be:
1:(2048x8) ~ 1:16384 and the 1 bit per page is 4 times better.
In the 1 bit per tuple a visibillity map of a 10GB relation would be
around 610KB
1 bit per page would then drop it to around 160KB.

Can we drag out some average-case numbers on row-size in the heap
from some real production systems?

I may have gotten something hugely wrong in above calculations and/or
have missed some important points.

--
Jesper

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-03-17 21:02:18 Re: 2nd Level Buffer Cache
Previous Message Radosław Smogura 2011-03-17 19:47:03 2nd Level Buffer Cache