Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Shubham Barai <shubhambaraiss(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
Date: 2018-04-10 05:24:05
Message-ID: 88B43B73-FB99-4969-8138-4D2C8263283F@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

> 9 апр. 2018 г., в 23:04, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
>
> On 09/04/18 18:21, Andrey Borodin wrote:
>>> 9 апр. 2018 г., в 19:50, Teodor Sigaev <teodor(at)sigaev(dot)ru>
>>> написал(а):
>>>> 3. Why do we *not* lock the entry leaf page, if there is no
>>>> match? We still need a lock to remember that we probed for that
>>>> value and there was no match, so that we conflict with a tuple
>>>> that might be inserted later. At least #3 is a bug. The attached
>>>> patch adds an isolation test that demonstrates it. #1 and #2 are
>>>> weird, and cause unnecessary locking, so I think we should fix
>>>> those too, even if they don't lead to incorrect results.
>>> I can't find a hole here. Agree.
>> Please correct me if I'm wrong. Let's say we have posting trees for
>> word A and word B. We are looking for a document that contains both. We will read through all posting tree of A, but only through some
>> segments of B. If we will not find anything in B, we have to lock
>> only segments where we actually were looking, not all the posting
>> tree of B.
>
> True, that works. It was not clear from the code or comments that that was intended. I'm not sure if that's worthwhile, compared to locking just the posting tree root block.
From the text search POV this is kind of bulky granularity: if you have frequent words like "the", "a", "in", conflicts are inevitable.
I'm not sure we have means for picking optimal granularity: should it be ranges of postings, ranges of pages of posting trees, entries, pages of entries or whole index.
Technically, [time for locking] should be less than [time of transaction retry]*[probability of conflict]. Holding this constraint we should minimize [time for locking] + [time of transaction retry]*[probability of conflict].
I suspect that [time for locking] is some orders of magnitude less than time of transaction. So, efforts should be skewed towards smaller granularity to reduce [probability of conflict].
But all this is not real math and have no strength of science.

> I'll let Teodor decide..
+1. I belive this is very close to optimal solution :)

Best regards, Andrey Borodin.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-10 05:37:19 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Michael Paquier 2018-04-10 05:04:13 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Browse pgsql-www by date

  From Date Subject
Next Message Peter Krauss 2018-04-15 12:30:57 Want to edit, but don't see an edit button when logged in
Previous Message Heikki Linnakangas 2018-04-09 18:04:46 Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)