Quick Links

Re: index fragmentation on insert-only table with non-unique column

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject:	Re: index fragmentation on insert-only table with non-unique column
Date:	2016-08-13 18:54:48
Message-ID:	20160813185448.GP1179@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Regarding this earlier thread:
https://www.postgresql.org/message-id/flat/20160524173914(dot)GA11880%40telsasoft(dot)com#20160524173914(dot)GA11880(at)telsasoft(dot)com

On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> Summary: Non-unique btree indices are returning CTIDs for rows with same
> value of indexed column not in logical order, imposing a high performance
> penalty.

I have to point out that by "logical" I clearly meant "physical", hopefully
nobody was too misled..

On Sun, Jun 05, 2016 at 12:28:47PM -0700, Jeff Janes wrote:
> On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> >> So correlated index scans look extra favourable vs bitmap index scans
> >> because bitmap heap scans consider random page costs sans correlation
> >> effects (even though correlation applies to bitmap heap scans as
> >> well).
> >
> > Really? How? The index ordering has nothing to do with the order in
> > which heap tuples will be visited.
>
> It is not the order itself, but the density.
>
> If the index is read in a range scan (as opposed to =ANY scan), and
> the index lead column is correlated with the table ordering, then the
> parts of the table that need to be visited will be much denser than if
> there were no correlation. But Claudio is saying that this is not
> being accounted for.

I didn't completely understand Claudio/Jeff here, and not sure if we're on the
same page. For queries on these tables, the index scan was very slow, due to
fragmented index on non-unique column, and seq scan would have been (was)
faster (even if it means reading 70GB and filtering out 6 of 7 days' data).
That was resolved by added a nightly reindex job (.. which sometimes competes
with other maintenance and has trouble running every table every night).

But I did find that someone else had previously reported this problem (in a
strikingly similar context and message, perhaps clearer than mine):
https://www.postgresql.org/message-id/flat/520D6610(dot)8040907%40emulex(dot)com#520D6610(dot)8040907(at)emulex(dot)com

I also found this older thread:
https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax(dot)com#n6cmpug13b9rk1srebjvhphg0lm8dou1kn(at)4ax(dot)com

There was mention of a TODO item:

* Compute index correlation on CREATE INDEX and ANALYZE, use it for index
* scan cost estimation

.. but perhaps I misunderstand and that's long since resolved ?

Justin

In response to

Re: index fragmentation on insert-only table with non-unique column at 2016-06-05 19:28:47 from Jeff Janes

Responses

Re: index fragmentation on insert-only table with non-unique column at 2016-08-15 17:42:03 from Claudio Freire
estimate correlation of index separately from table (Re: index fragmentation on insert-only table with non-unique column) at 2017-07-07 23:41:19 from Justin Pryzby

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Claudio Freire	2016-08-15 17:42:03	Re: index fragmentation on insert-only table with non-unique column
Previous Message	Vladimir Borodin	2016-08-11 13:55:49	Re: Planner do seq scan on empty master partitioned table