Quick Links

Re: The plan for FDW-based sharding

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: The plan for FDW-based sharding
Date:	2016-03-01 17:07:24
Message-ID:	56D5CC4C.4080508@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you very much for you comments.

On 01.03.2016 18:19, Robert Haas wrote:
> On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>> How do you prevent clock skew from causing serialization anomalies?
>> If node receives message from "feature" it just needs to wait until this
>> future arrive.
>> Practically we just "adjust" system time in this case, moving it forward
>> (certainly system time is not actually changed, we just set correction value
>> which need to be added to system time).
>> This approach was discussed in the article:
>> http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
>> I hope, in this article algorithm is explained much better than I can do
>> here.
> Hmm, the approach in that article is very interesting, but it sounds
> different than what you are describing - they do not, AFAICT, have
> anything like a "correction value"

In the article them used anotion "wait":

if T.SnapshotTime>GetClockTime()
then wait until T.SnapshotTime<GetClockTime()

Originally we really do sleep here, but then we think that instead of
sleeping we can just adjust local time.
Sorry, I do not have format prove it is equivalent but... at least we
have not encountered any inconsistencies after this fix and performance
is improved.
>
>> There are well know limitation of this pg_tsdtm which we will try to
>> address in future.
> How well known are those limitations? Are they documented somewhere?
> Or are they only well-known to you?
Sorry, well know for us.
But them are described at DTM wiki page.
Right now pg_tsdtm is not supporting correct distributed deadlock
detection (is not building global lock graph) and is detecting
distributed deadlocks just based on timeouts.
It doesn't support explicit locks but "select for update" will work
correctly.

>> What we want is to include XTM API in PostgreSQL to be able to continue our
>> experiments with different transaction managers and implementing multimaster
>> on top of it (our first practical goal) without affecting PostgreSQL core.
>>
>> If XTM patch will be included in 9.6, then we can propose our multimaster as
>> PostgreSQL extension and everybody can use it.
>> Otherwise we have to propose our own fork of Postgres which significantly
>> complicates using and maintaining it.
> Well I still think what I said before is valid. If the code is good,
> let it be a core submission. If it's not ready yet, submit it to core
> when it is. If it can't be made good, forget it.

I have nothing against committing DTM code in core. But still the best
way of integration it is to use a-la-OO approach.
So still need API. Inserting if-s or switches in existed code is IMHO
ugly idea.

Also it is not enough for DTM code to be just "good". It should provide
expected functionality.
But which functionality is expected? From my experience of development
different cluster solutions I can say that
different customers have very different requirements. It is very hard if
ever possible to satisfy them all.

Right now I do not feel that I can predict all possible requirements to DTM.
This is why we want to provide some API, propose some implementations of
this API, receive feedbecks and get better understanding which
functionality is actually needed by customers.

>
>>> This seems rather defeatist. If the code is good and reliable, why
>>> should it not be committed to core?
>> Two reasons:
>> 1. There is no ideal implementation of DTM which will fit all possible needs
>> and be efficient for all clusters.
> Hmm, what is the reasoning behind that statement? I mean, it is
> certainly true that there are some places where we have decided that
> one-size-fits-all is not the right approach. Indexing, for example.
> But there are many other places where we have not chosen to make
> things pluggable, and that I don't think it should be taken for
> granted that plugability is always an advantage.
>
> I fear that building a DTM that is fully reliable and also
> well-performing is going to be really hard, and I think it would be
> far better to have one such DTM that is 100% reliable than two or more
> implementations each of which are 99% reliable.

The question is not about it's reliability, but mostly about its
functionality and flexibility.

>> 2. Even if such implementation exists, still the right way of it integration
>> is Postgres should use kind of TM API.
> Sure, APIs are generally good, but that doesn't mean *this* API is good.

Well, I do not what to say "better than nothing", but I find this API to
be a reasonable compromise between flexibility and minimization of
changes in PostgreSQL core. If you have some suggestions how to improve
it, I will be glad to receive them.

>
>> I hope that everybody will agree that doing it in this way:
>>
>> #ifdef PGXC
>> /* In Postgres-XC, stop timestamp has to follow the timeline of GTM
>> */
>> xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
>> #else
>> xlrec.xact_time = xactStopTimestamp;
>> #endif
> PGXC chose that style in order to simplify merging. I wouldn't have
> picked the same thing, but I don't know why it deserves scorn.
>
>> or in this way:
>>
>> xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp
>> : xactStopTimestamp;
>>
>> is very very bad idea.
> I don't know why that is such a bad idea. It's a heck of a lot faster
> than insisting on calling some out-of-line function. It might be a
> bad idea, but I think we need to decide that, not assume it.
>
It violates modularity, complicates code, makes it more error prone.
I still prefer to extract all DTM code in separate module.
It should not necessary be an extension.
But from the other side - it is not required to put in in core.
At least at this stage. As i already wrote - not just because code is
not good enough or is not reliable enough,
but because I am not sure that it is fits all (or just most) of use cases.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Re: The plan for FDW-based sharding at 2016-03-01 15:19:45 from Robert Haas

Responses

Re: The plan for FDW-based sharding at 2016-03-05 00:45:33 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Petr Jelinek	2016-03-01 17:09:28	Re: Confusing with commit time usage in logical decoding
Previous Message	Alvaro Herrera	2016-03-01 16:57:55	Re: Confusing with commit time usage in logical decoding