Quick Links

Re: plan time of MASSIVE partitioning ...

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Leonardo Francalanci <m_lists(at)yahoo(dot)it>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: plan time of MASSIVE partitioning ...
Date:	2010-11-05 02:59:10
Message-ID:	10754.1288925950@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

[ for the archives' sake ]

I wrote:
> I had a thought about how to make get_tabstat_entry() faster without
> adding overhead: what if we just plain remove the search, and always
> assume that a new entry has to be added to the tabstat array?

I spent some time looking into this idea. It doesn't really work,
because there are places that will break if a transaction has more than
one tabstat entry for the same relation. The one that seems most
difficult to fix is that pgstat_recv_tabstat() clamps its n_live_tuples
and n_dead_tuples values to be nonnegative after adding in each delta
received from a backend. That is a good idea because it prevents insane
results if some messages get lost --- but if a transaction's updates get
randomly spread into several tabstat items, the intermediate counts
might get clamped, resulting in a wrong final answer even though nothing
was lost.

I also added some instrumentation printouts and found that in our
regression tests:
* about 10% of get_tabstat_entry() calls find an existing entry
for the relation OID. This seems to happen only when a
relcache entry gets flushed mid-transaction, but that does
happen, and not so infrequently either.
* about half of the transactions use as many as 20 tabstats,
and 10% use 50 or more; but it drops off fast after that.
Almost no transactions use as many as 100 tabstats.
It's not clear that these numbers are representative of typical
database applications, but they're something to start with anyway.

I also did some testing to compare the cost of get_tabstat_entry's
linear search against a dynahash.c table with OID key. As I suspected,
a hash table would make this code a *lot* slower for small numbers of
tabstat entries: about a factor of 10 slower. You need upwards of 100
tabstats touched in a transaction before the hash table begins to pay
for itself. This is largely because dynahash doesn't have any cheap way
to reset a hashtable to empty, so you have to initialize and destroy the
table for each transaction. By the time you've eaten that overhead,
you've already expended as many cycles as the linear search takes to
handle several dozen entries.

I conclude that if we wanted to do something about this, the most
practical solution would be the one of executing linear searches until
we get to 100+ tabstat entries in a transaction, and then building a
hashtable for subsequent searches. However, it's exceedingly unclear
that it will ever be worth the effort or code space to do that.

regards, tom lane

In response to

Re: plan time of MASSIVE partitioning ... at 2010-11-01 14:18:27 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2010-11-05 05:16:10	Re: psycopg and two phase commit
Previous Message	Fujii Masao	2010-11-05 01:00:41	Re: timestamp of the last replayed transaction