From: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Subject: | Re: Index Skip Scan |
Date: | 2020-01-21 16:09:42 |
Message-ID: | 20200121160942.eu4bvbu46nt223ug@localhost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Mon, Jan 20, 2020 at 01:19:30PM -0800, Peter Geoghegan wrote:
Thanks for the commentaries. I'm trying to clarify your conclusions for
myself, so couple of questions.
> > > - nbtsearch.c in general
> > > Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine.
>
> Try making _bt_findinsertloc() call _bt_vacuum_one_page() whenever the
> page is P_HAS_GARBAGE(), regardless of whether or not the page is
> about to split. That will still be correct, while having a much better
> chance of breaking the patch during stress-testing.
>
> Relying on a buffer pin to prevent the B-Tree structure itself from
> changing in any important way seems likely to be broken already. Even
> if it isn't, it sounds fragile.
Except for checking low/high key (which should be done with a lock), I
believe the current implementation follows the same pattern I see quite
often, namely
* get a lock on a page of interest and test it's values (if we can find
next distinct value right on the next one without goind down the tree).
* if not, unlock the current page, search within the tree with
_bt_search (which locks a resuling new page) and examine values on a
new page, when necessary do _bt_steppage
Is there an obvious problem with this approach, when it comes to the
page structure modification?
> A leaf page doesn't really have anything called a low key. It usually
> has a current first "data item"/non-pivot tuple, which is an
> inherently unstable thing.
Would this inherent instability be resolved for this particular case by
having a lock on a page while checking a first data item, or there is
something else I need to take into account?
> > There is a BT_READ lock in place when finding the correct leaf page, or
> > searching within the leaf page itself. _bt_vacuum_one_page deletes only
> > LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
> > you have some feedback for this ?
>
> It sounds like the design of the patch relies on doing something other
> than stopping a scan "between" pages, in the sense that is outlined in
> the commit message of commit 09cb5c0e. If so, then that's a serious
> flaw in its design.
Could you please elaborate why does it sound like that? If I understand
correctly, to stop a scan only "between" pages one need to use only
_bt_readpage/_bt_steppage? Other than that there is no magic with scan
position in the patch, so I'm not sure if I'm missing something here.
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-01-21 16:20:17 | Re: We're getting close to the end of 2020-01 CF |
Previous Message | Alvaro Herrera | 2020-01-21 15:25:14 | Re: We're getting close to the end of 2020-01 CF |