Skip all-visible pages during second HeapScan of CIC

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Skip all-visible pages during second HeapScan of CIC
Date: 2017-02-28 13:42:03
Message-ID: CABOikdO+=3=rK_Y=8o-xd5oPiNSPsoORYThJUCNE8kWm1pWOow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello All,

During the second heap scan of CREATE INDEX CONCURRENTLY, we're only
interested in the tuples which were inserted after the first scan was
started. All such tuples can only exists in pages which have their VM bit
unset. So I propose the attached patch which consults VM during second scan
and skip all-visible pages. We do the same trick of skipping pages only if
certain threshold of pages can be skipped to ensure OS's read-ahead is not
disturbed.

The patch obviously shows significant reduction of time for building index
concurrently for very large tables, which are not being updated frequently
and which was vacuumed recently (so that VM bits are set). I can post
performance numbers if there is interest. For tables that are being updated
heavily, the threshold skipping was indeed useful and without that we saw a
slight regression.

Since VM bits are only set during VACUUM which conflicts with CIC on the
relation lock, I don't see any risk of incorrectly skipping pages that the
second scan should have scanned.

Comments?

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
cic_skip_all_visible_v3.patch application/octet-stream 10.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-02-28 13:51:24 Re: BRIN de-summarize ranges
Previous Message Tom Lane 2017-02-28 13:30:43 Re: avoid bloat from CREATE INDEX CONCURRENTLY