Re: crash-safe visibility map, take four

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash-safe visibility map, take four
Date: 2011-03-23 06:16:12
Message-ID: 4D89902C.7000004@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-03-22 21:43, Robert Haas wrote:
> I took a crack at implementing the first approach described above,
> which seems to be by far the simplest idea we've come up with to date.
> Patch attached. It doesn't seem to be that complicated, which could
> mean either that it's not that complicated or that I'm missing
> something. Feel free to point and snicker in the latter case.

Looks simple, but there is now benefit on the usage side in the patch,
so it isn't really "testable" yet? I would love to spend some time testing
when its doable (even with rough corners.)

I'm still a bit puzzled with how it would end up working with a page-level
visibillity map bit for index-scans. There is a clear "drop off" in
usabillity
when the change rates of the table goes up, which may or may not be
relevant, but I cannot really judge, since I haven't even got a ballpark
figure about how much table churn would disable say 50% of the usage.

= Really naive suggestion approaching =
Another layout might be to simply drag out t_xmin, t_xmax pr row (8 bytes)
into a table by itself. This table will be way bigger than the one bit
per page
map, but could be "wal-logged" as any other change in the system?

It would, by definition make the visibility testing work (way faster
than today),
no matter how fast the underlying table changes.

State of today (PG 8.4) is that a query like this:
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag');
count
-------
69753
(1 row)

Time: 5863.600 ms
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag');
count
-------
69753
(1 row)

Time: 659.832 ms
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag');
count
-------
69753
(1 row)

Time: 1005.765 ms

Somewhere around 15ns / tuple (not bad at all).
(the first was probably "half warm")

The "average" rows per tuple is somewhere between 4 and 8 for this
table, assuming
8 and that the 69K are randomly distributed among the 16M other tuples
(fair assumption
in this case). The 600-1000ms for the fresh cache run are the timing to
drag:
69753*8192 (page size) = 571MB into memory for visibillity testing
alone, on warm cache
all pages being in main memory. Packing 16M tuples with 8 bytes / tuple
in a map would be
around 128MB.

given 8 bytes/row and random distribution of data, that would require us
to read all 128MB,
so a speedup of x4 on this example, but it would rougly let us count the
entire table in
the same time.

With regard to disk vs. memory hotness.. those 128MB compares to a table
size of 32GB
(with a toast table next to it of 64GB) but that shouldn't be touched by
above query.

The ns/tuple number (today) on a "thin" table in my system is
approaching 1ns / tuple.

If the page-level bitmap would be set "quite fast" on a fairly busy
system anyway, then
the above is just noise in the air, but I have currently no feeling, and
there is
some math in there I have trouble setting reliable ballpark numbers on.

There is, by all approaches room for significant improvements for the
visibillity
testing for a huge range of installations.

Can I drag out numbers of "frozenness of tuples" from my current systems
to fill in the
discussion? (how?)

Jesper
--
Jesper

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gokulakannan Somasundaram 2011-03-23 06:29:04 Re: crash-safe visibility map, take four
Previous Message Fujii Masao 2011-03-23 05:48:14 Re: pg_ctl restart - behaviour based on wrong instance