Re: Cache relation sizes?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2020-12-30 04:39:52
Message-ID: CA+hUKGKGEQLqfKitL=PT5Ch_5md499jHJv+O8OP2kq5_PEcnqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 28, 2020 at 5:24 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> lseek(..., SEEK_END) = 9670656
> write(...) = 8192
> lseek(..., SEEK_END) = 9678848
> fsync(...) = -1
> lseek(..., SEEK_END) = 9670656
>
> I got 2 information from above. a) before the fsync, the lseek(fd, 0, SEEK_END)
> returns a correct value, however after the fsync, it returns a wrong value. b).
> the fsync failed(return values is -1) in the above case. I'm more confused
> because of point a). Since the fsync can't correct some wrong value, what
> is the point to call fsync in this patch (I agree that it is good to fix this
> reliability problem within this patch, but I'm doubtful if fsync can help in
> this case). Am I missing something obviously?

The point of the fsync() call isn't to correct the value, it's to find
out if it is safe to drop the SR object from shared memory because the
operating system has promised that it won't forget about the new size.
If that fails, we'll report an ERROR or PANIC (depending on
data_sync_retry), but not drop the SR.

By the way, you don't need fsync(fd) for the size to change, as shown
by the other experiment in that other thread that just did sleep(60).
It might also happen asynchronously. fsync(fd) forces it, but also
tells you about errors.

> I read your patch with details some days ago and feel it is in pretty good shape.
> and I think you are clear about the existing issues very well. like a). smgr_alloc_sr use a
> FIFO design. b). SR_PARTITION_LOCK doesn't use a partition lock really. c). and

Yeah, it probably should have its own bank of locks (I wish they were
easier to add, via lwlocknames.txt or similar).

> looks like you have some concern about the concurrency issue, which I didn't find
> anything wrong. d). how to handle the DIRTY sr during evict. I planned to recheck

So yeah, mostly this discussion has been about not trusting lseek()
and using our own tracking of relation size instead. But there is a
separate question about how "fresh" the value needs to be. The
question is: under what circumstances could the unlocked read of
nblocks from shared memory give you a value that isn't fresh enough
for your snapshot/scan? This involves thinking about implicit memory
barriers.

The idea of RECENTLY_DIRTIED is similar to what we do for buffers:
while you're trying to "clean" it (in this case by calling fsync())
someone else can extend the relation again, and in that case we don't
know whether the new extension was included in the fsync() or not, so
we can't clear the DIRTY flag when it completes.

> item c) before this reply, but looks like the time is not permitted. May I know what
> Is your plan about this patch?

Aside from details, bugs etc discussed in this thread, one major piece
remains incomplete: something in the background to "clean" SRs so that
foreground processes rarely have to block.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-12-30 04:52:20 Re: Re: Re: Cache relation sizes?
Previous Message Justin Pryzby 2020-12-30 04:39:30 Re: pglz compression performance, take two