Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql general <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR
Date: 2019-07-08 16:23:15
Message-ID: DM6PR06MB5562839D0229D832E1AE4742A3F60@DM6PR06MB5562.namprd06.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> From: Peter Geoghegan <pg(at)bowt(dot)ie>
> Sent: June 26, 2019 12:09 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR
>
> > >
> > > I suspect that you might find that the enhancements to B-Tree indexes
> > > that went into Postgres 12 would help with this workload, especially
> > > if you notice that this happens with indexes that have a lot of duplicates
> > >
>
> >
> > I had not noticed that, thanks for pointing it out. Yes , in my workload most of the indexes in question
> > are non-unique and some have very low key card. I will try out the pg-12 beta when I get a chance.
> >
>
>
> It's easy to show problems with very low cardinality indexes in the
> old code. You'll definitely notice a difference there.
>

I've run a comparison of pg-12beta2 with an older release, pg-9.4.6, running the same intensive-delete-insert workload as mentioned before in this thread, and would appreciate if you could comment on the results, especially as to whether they are roughly in line with your expectation.
I also have one question about a new-in-pg-11 config parm.

Briefly, the workload involves a repetition of a loop in which , on one single table which has 8 indexes, 2 unique and 6 non-unique,
about 4300 records are deleted, and for each of those records, a corresponding record is inserted in which one or more of the non-unique key values are modified to values which are not present in the relevant index at that point. In other words, across all the indexes , 4300 key-tids are deleted and then 4300 new key-tids are inserted. At the end of each loop there is zero net change in counts of records and keys but possibly some increase in numbers of pages, which is what the test is interested in as well as overall throughput rate.

For this test, I did not modify the index default fill factors which therefore remained at 90%, in order to make a stab at evaluating not setting explicit fillfactor. In each case the indexes were either freshly loaded or else reindexed to have the same starting density. Here are counts and sizes after 768 iterations

| tbpages | tbtuples | ixname | isuniq | livetuples | deadtuples | avg_leaf_density | ixpages
+---------+--------------+------------------------------+--------+------------+------------+------------------+---------

pg-9.4.6
---------------------

| 32160 | 2.55548e+06 | metadata_value_boolean | f | 2932852 | 0 | 46.39 | 13535
| 32160 | 2.55548e+06 | metadata_value_field_id | f | 2932852 | 0 | 48.58 | 12916
| 32160 | 2.55548e+06 | metadata_value_floatnumber | f | 2932852 | 0 | 45.97 | 13658
| 32160 | 2.55548e+06 | metadata_value_longnumber | f | 2932852 | 0 | 48.26 | 13009
| 32160 | 2.55548e+06 | metadata_value_owner_field_u | t | 2932852 | 0 | 58.69 | 14990
| 32160 | 2.55548e+06 | metadata_value_owner_id | f | 2932852 | 0 | 53.06 | 11817
| 32160 | 2.55548e+06 | metadata_value_pkey | t | 2932852 | 0 | 57.83 | 10842
| 32160 | 2.55548e+06 | metadata_value_timestamp | f | 2932852 | 0 | 45.96 | 13663

pg-12beta2
---------------------

| 41814 | 2.519268e+06 | metadata_value_boolean | f | 2519268 | 6766 | 63.17 | 10768
| 41814 | 2.519268e+06 | metadata_value_field_id | f | 2519268 | 6766 | 68.7 | 12031
| 41814 | 2.519268e+06 | metadata_value_floatnumber | f | 2519268 | 6766 | 61.48 | 11225
| 41814 | 2.519268e+06 | metadata_value_longnumber | f | 2519268 | 6766 | 58.34 | 12397
| 41814 | 2.519268e+06 | metadata_value_owner_field_u | t | 2519268 | 6766 | 61.69 | 14780
| 41814 | 2.519268e+06 | metadata_value_owner_id | f | 2519268 | 6766 | 48.86 | 12947
| 41814 | 2.519268e+06 | metadata_value_pkey | t | 2519268 | 6766 | 59.71 | 11076
| 41814 | 2.519268e+06 | metadata_value_timestamp | f | 2519268 | 6766 | 57.81 | 12295

Overall, pg-12beta2 yielded a 6.7% reduction in sizes (total pages) of indexes, which was most noticable with the 6 non-unique ones.
In fact the primary-key index was larger with pg-12. Would you have expected better than 6.7%? Although a welcome improvement, I think it is not enough to justify stopping use of setting a lower explicit FILLFACTOR. Which then brings me back to thinking there is a case for the subject of this thread, an automatic way to preserve density.

Secondary points:

I did not expect to see the number of table pages grow so much larger for pg-12 than for pg-9.4. The number of table pages was almost identical at the start of each run. However this was not the focus of the test.

Also, although not shown in those tables, pg-12 was around 4.5 times faster in completing those 768 iterations, an enormous improvement.

And one question :
I notice that in some pg-11 release, a new config parameter appeared :
vacuum_cleanup_index_scale_factor
specifies the fraction of the total number of heap tuples counted in the previous statistics collection that can be inserted without incurring an index scan at the VACUUM cleanup stage.

I have not researched this at all and nor did I set it to anything for my pg-12beta2 run, but it sounds as though maybe it could be relevant to this kind of workload - Is that so?

Cheers John Lumby

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-07-08 16:29:28 Re: Measuring the Query Optimizer Effect: Turning off the QO?
Previous Message Andrew Gierth 2019-07-08 16:14:51 Re: Measuring the Query Optimizer Effect: Turning off the QO?