Visibillity testing - some numbers on current performance.

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Visibillity testing - some numbers on current performance.
Date: 2011-04-05 19:04:18
Message-ID: 4D9B67B2.6020007@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

I initially set out to put some numbers on "why" the visibillity
map was important for "select count(*)", primarily to give some
feedback to Simon Riggs stating:
"Your tests and discussion remind me that I haven't yet seen any tests
that show that index-only scans would be useful for performance."

I created at small testscript that (on my lousy desktop) created
a bunch of different tables, all with 2.000.000 rows and different
width. The tests are created so that the tuple-with of around 1500bytes
ends around the size of memory of my system, so we eventually have
some number for when we're going to "disk". My desktop is only a
single SATA 7200 rpm drive, 3GB of memory and no battery backed
write cache (who would ever run a REAL database on such a system anyway).
(script and output-data, graphs in links at the end of the email).

The on this system, visibillity testing take from:

0.2s to 0.5s in total for 2.000.000 tuples

dependent on tuple-size (header+14 bytes to header
+ 504 bytes), this is all in situations where we can assume that
hit-bits are
set and all thing have kicked in and the complete dataset
is memory cached.

This made me conclude that, this is just awesome, if I could get
somewhere near these numbers in my production
system then I would have been haunting perfomance in totally different
areas.

One the first select after load, still in situations where we're fully
memory recident the range is instead:

0.6s to 35s in total for 2.000.000 tuples
(First run - second run for +10 bytes and +500 bytes respectively).

This degradation can mostly be attributed to concurrent writeout of
hit-bits on a single SATA-drive (one-spindle), who would sanely run a
db on such a system anyway, this number is probably the least robust
one in the test, but the huge range should lead to some concern anyway.

The last range is the range where we're hitting disk, and that is
fairly uninteresting as is can more or less be seen as a speed of
the underlying disk-system, where the one in this one is in the "low
range". But is seen from a sequential throughput perspective which
this one tests is still does about 80MB/s and an "expensive one"
will not buy an order of magnitude in this area.

The range is in this case:
1.5s to 42s in total for 2.000.000 tuples.
the first one for a +10 bytes tuple, the last one for a +1500 bytes.

Of the really non-interesting information is that when I add +2500 bytes
to the tuple it goes down to 2.1s, which is due to toast kicking in and
the fact that the data I load are highly compressable so it ends up
filling next to nothing.

Conclusion:
Visibillity testing of 2.000.000 tuples takes between 0.2s and 42s, where
your system fits into that range hugely depends on your tuple-size, the
amount of bloat on your tables, the amount of memory for caching
compared to the total database size and if you have sufficient activity
on your system for keeping your "active" dataset in memory in favor
of background activities.

If your active dataset approaches cache-size of the system, you're
very likely to hit in the last part of that range.

So Simon Riggs question is really good, since the answer would be
"it depends". It seems plausible that it is really hard to beat the
0.2s-0.5s for 2m tuples we currently meet for any kind of memory
resident dataset.

The approach currently being pursues, splitting of the PD_ALL_VISIBLE
bit and using that for visibillity testing, would improve the situation
enourmously making all the diskbound cases to be in the order of

primary-key-index-size+vm-map-size/disk-throughput

instead of

main-table-size/disk-throughput

Which for "slim" tables wouldn't be that much, but for fat tables it
can/would be substantial. But it would be crusial to have the bit set
at all, and the likelihood would fall with the amount of churn in the
table. The worst-case situation would be where the bit is not set
anyway and there the speed would be "a primary index" worse than
currently and if the best-case would be better at all. But I'm fairly
sure that the "average case" would be quite a lot lower than it is today,
just by the likelyhood of indexes and vm-maps being in-memory .
(at least for databases hitting disk occationally).

Getting below 0.2s for 2.000.000 tuples would somewhat be nice
but gettting the worst-case numbers an order of magnitue down
would be an enourmous benefit to "large" or infrequently used
databases.

My conclusion is somewhere along the line of:
Gettting the visibillity map crash-safe and updated is not the
primary goal, but getting the "visibillity testing" separated
from the varying size of tuples seems to be the key point here
and doing it by moving the PD_ALL_VISIBLE bit out is definately
one way of doing it.

Another approach could be to "way more aggressively" push to TOAST,
this would effectively push the worst-case behaviours down.
(I'll try to do some sane benchmarking around that).

A third approach could be to do a "slim table", with only the relevant
bits from the page/tuple-header sufficient to perform visibillity testing
giving a "fixed" cost of visibillity testing.

Other approaches ... I'm not that much into PG-internals.

spreadsheet with numbers and a couple of graphs:
http://shrek.krogh.cc/~jesper/visibillitytesting.pdf
Testscript:
http://shrek.krogh.cc/~jesper/visibillity-testing.pl

Tests done on git-head today.

.. comments and suggestions for other related
tests are welcome.

--
Jesper

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-04-05 19:05:14 Re: WIP: Allow SQL-language functions to reference parameters by parameter name
Previous Message Dave Page 2011-04-05 18:59:47 Re: [HACKERS] Uppercase SGML entity declarations