From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Shawn Debnath <sdn(at)amazon(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Refactoring the checkpointer's fsync request queue |
Date: | 2019-02-16 02:45:02 |
Message-ID: | 20190216024502.iluiqfniwa7eolb4@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-02-13 18:40:05 +1300, Thomas Munro wrote:
> Thanks! And sorry for not replying sooner -- I got distracted by
> FOSDEM (and the associated 20 thousand miles of travel). On that trip
> I had a chance to discuss this patch with Andres Freund in person, and
> he opined that it might be better for the fsync request queue to work
> in terms of pathnames. Instead of the approach in this patch, where a
> backend sends an fsync request for { reflfilenode, segno } inside
> mdwrite(), and then the checkpointer processes the request by calling
> smgrdimmedsyncrel(), he speculated that it'd be better to have
> mdwrite() send an fsync request for a pathname, and then the
> checkpointer would just open that file by name and fsync() it. That
> is, the checkpointer wouldn't call back into smgr.
>
> One of the advantages of that approach is that there are probably
> other files that need to be fsync'd for each checkpoint that could
> benefit from being offloaded to the checkpointer. Another is that you
> break the strange cycle mentioned above.
The other issue is that I think your approach moves the segmentation
logic basically out of md into smgr. I think that's wrong. We shouldn't
presume that every type of storage is going to have segmentation that's
representable in a uniform way imo.
> Another consideration if we do that is that the existing scheme has a
> kind of hierarchy that allows fsync requests to be cancelled in bulk
> when you drop relations and databases. That is, the checkpointer
> knows about the internal hierarchy of tablespace, db, rel, seg. If we
> get rid of that and have just paths, it seems like a bad idea to teach
> the checkpointer about the internal structure of the paths (even
> though we know they contain the same elements encoded somehow). You'd
> have to send an explicit cancel for every key; that is, if you're
> dropping a relation, you need to generate a cancel message for every
> segment, and if you're dropping a database, you need to generate a
> cancel message for every segment of every relation.
I can't see that being a problem - compared to the overhead of dropping
a relation, that doesn't seem to be a meaningfully large cost?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-02-16 03:01:50 | Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk |
Previous Message | Justin Pryzby | 2019-02-16 02:38:54 | REL_11_STABLE: dsm.c - cannot unpin a segment that is not pinned |