On Tue, Jan 24, 2012 at 11:24:08AM -0500, Jaime Casanova wrote:
> On Mon, Jan 23, 2012 at 7:18 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > If someone feels like
> > doing it, +1 for making pgstattuple() count non-leaf free space.
> actually i agreed that non-leaf pages are irrelevant... i just
> confirmed that in a production system with 300GB none of the indexes
> in an 84M rows table nor in a heavily updated one has more than 1 root
> page, all the rest are deleted, half_dead or leaf. so the posibility
> of bloat coming from non-leaf pages seems very odd
FWIW, the number to look at is internal_pages from pgstatindex():
[local] test=# create table t4 (c) as select * from generate_series(1,1000000);
[local] test=# alter table t4 add primary key(c);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "t4_pkey" for table "t4"
[local] test=# select * from pgstatindex('t4_pkey');
-[ RECORD 1 ]------+---------
version | 2
tree_level | 2
index_size | 22478848
root_block_no | 290
internal_pages | 10
leaf_pages | 2733
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 90.06
leaf_fragmentation | 0
So, 0.4% of this index. They appear in proportion to the logarithm of the
total index size. I agree that bloat centered on them is unlikely. Counting
them would be justified, but that is a question of formal accuracy rather than
> but the possibility of bloat coming from the meta page doesn't exist,
> AFAIUI at least
> we need the most accurate value about usable free space, because the
> idea is to add a sampler mode to the function so we don't scan the
> whole relation. that's why we still need the function.
I doubt we'd add this function solely on the basis that a future improvement
will make it useful. For the patch to go in now, it needs to be useful now.
(This is not a universal principle, but it mostly holds for low-complexity
patches like this one.)
All my comments below would also apply to such a broader patch.
> btw... pgstattuple also has the problem that it's not using a ring buffer
> attached are two patches:
> - v5: is the same original patch but only track space in leaf, deleted
> and half_dead pages
> - v5.1: adds the same for all kind of indexes (problem is that this is
> inconsistent with the fact that pageinspect only manages btree indexes
> for everything else)
Let's take a step back. Again, what you're proposing is essentially a faster
implementation of "SELECT free_percent FROM pgstattuple(rel)". If this code
belongs in core at all, it belongs in the pgstattuple module. Share as much
infrastructure as is reasonable between the user-visible functions of that
module. For example, I'm suspecting that the pgstat_index() call tree should
be shared, with pgstat_index_page() checking a flag to decide whether to
gather per-tuple stats.
Next, compare the bits of code that differ between pgstattuple() and
relation_free_space(), convincing yourself that the differences are justified.
Each difference will yield one of the following conclusions:
1. Your code contains an innovation that would apply to both functions. Where
not too difficult, merge these improvements into pgstattuple(). In order for
a demonstration of your new code's better performance to be interesting, we
must fix the same low-hanging fruit in its competitor. One example is the use
of the bulk read strategy. Another is the support for SP-GiST.
2. Your code is missing an essential behavior of pgstattuple(). Add it to
your code. One example is the presence of CHECK_FOR_INTERRUPTS() calls.
3. Your code behaves differently from pgstattuple() due to a fundamental
difference in their tasks. These are the only functional differences that
ought to remain in your finished patch; please point them out in comments.
For example, pgstat_heap() visits every tuple in the heap. You'll have no
reason to do that; pgstattuple() only needs it to calculate statistics other
In particular, I call your attention to the fact that pgstattuple() takes
shared buffer content locks before examining pages. Your proposed patch does
not do so. I do not know with certainty whether that falls under #1 or #2.
The broad convention is to take such locks, because we elsewhere want an exact
answer. These functions are already inexact; they make no effort to observe a
consistent snapshot of the table. If you convince yourself that the error
arising from not locking buffers is reasonably bounded, we can lose the locks
(in both functions -- conclusion #1). Comments would then be in order.
With all that done, run some quick benchmarks: see how "SELECT free_percent
FROM pgstattuple(rel)" fares compared to "SELECT relation_free_space(rel)" for
a large heap and for a large B-tree index. If the timing difference is too
small to be interesting to you, remove relation_free_space() and submit your
pgstattuple() improvements alone. Otherwise, submit as written.
I suppose I didn't make all of this clear enough earlier; sorry for that.
In response to
pgsql-hackers by date
|Next:||From: Noah Misch||Date: 2012-01-26 02:51:13|
|Subject: Re: Second thoughts on CheckIndexCompatible() vs.operator families|
|Previous:||From: Robert Haas||Date: 2012-01-26 02:25:33|
|Subject: Re: Should I implement DROP INDEX CONCURRENTLY?|