Re: Vacuuming big btree indexes without pages with deleted items

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuuming big btree indexes without pages with deleted items
Date: 2015-03-30 22:23:46
Message-ID: 5519CCF2.9080006@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/27/15 5:15 AM, Vladimir Borodin wrote:
> Hi all.
>
> I have described [0] a problem with delaying replicas after vacuuming a
> relation with big btree index. It stucks in replaying WAL record of
> type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0):
>
> rmgr: Btree len (rec/tot): 20/ 52, tx: 0, lsn:
> 4115/56126DC0, prev 4115/56126D90, bkp: 0000, desc: vacuum: rel
> 1663/16420/16796; blk 31222118, lastBlockVacuumed 0
>
> Master writes this record to xlog in btvacuumscan [1] function after
> vacuuming of all index pages. And in case of no pages with deleted items
> xlog record would contain lastBlockVacuumed 0.
>
> In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed
> to last block of the index while applying this record because there is
> no api in the buffer manager to understand if the page is unpinned.
>
> So if the index is quite big (200+ GB in described case) it takes much
> time to do it. So the questions are:
>
> 1. Aren’t there still any api in buffer manager to understand that the
> page is not in shared_buffers without reading it?
I don't know offhand, but since XLogReadBufferExtended already has a
mode argument it wouldn't be too hard to add it there.

> 2. Is it possible not to write to xlog record with lastBlockVacuumed 0
> in some cases? For example, in case of not deleting any pages.

Possibly, but that's much higher risk. Without studying it, if we wanted
to mess around with that it might actually make more sense to XLOG a set
of blkno's that got vacuumed, but I suspect that wouldn't be a win.

> Or maybe there are some better ways of improving this situation?
>
> [0]
> http://www.postgresql.org/message-id/FE82A9A7-0D52-41B5-A9ED-967F6927CB8A@simply.name
> [1]
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813
> [2]
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482
>
> --
> May the force be with you…
> https://simply.name
>

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-30 22:29:07 Re: Exposing PG_VERSION_NUM in pg_config
Previous Message Tom Lane 2015-03-30 20:50:14 Re: Ignoring entries generated by autoconf in code tree