Re: Cache relation sizes?

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2020-12-28 04:24:35
Message-ID: CAKU4AWpRELqgWnZ1mEU9as9Pb8C9OGwHKek+OQt=1w+VNQsbpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 24, 2020 at 6:59 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Thu, Dec 17, 2020 at 10:22 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> > Let me try to understand your point. Suppose process 1 extends a file to
> > 2 blocks from 1 block, and fsync is not called, then a). the lseek *may*
> still
> > return 1 based on the comments in the ReadBuffer_common ("because
> > of buggy Linux kernels that sometimes return an seek SEEK_END result
> > that doesn't account for a recent write"). b). When this patch is
> introduced,
> > we would get 2 blocks for sure if we can get the size from cache. c).
> After
> > user reads the 2 blocks from cache and then the cache is evicted, and
> user
> > reads the blocks again, it is possible to get 1 again.
> >
> > Do we need to consider case a) in this patch? and Is case c). the
> situation you
> > want to avoid and do we have to introduce fsync to archive this?
> Basically I
> > think case a) should not happen by practice so we don't need to handle
> case
> > c) and finally we don't need the fsync/SR_SYNCING/SR_JUST_DIRTIED flag.
> > I'm not opposed to adding them, I am just interested in the background
> in case
> > you are also willing to discuss it.
>
> Sorry for the lack of background -- there was some discussion on the
> thread "Optimize dropping of relation buffers using dlist", but it's
> long and mixed up with other topics. I'll try to summarise here.
>
> It's easy to reproduce files shrinking asynchronously on network
> filesystems that reserve space lazily[1],

Thanks for the explaintain. After I read that thread, I am more confused
now.
In [1], we have the below test result.

$ cat magic_shrinking_file.c
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
int fd;
char buffer[8192] = {0};

fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND);
if (fd < 0) {
perror("open");
return EXIT_FAILURE;
}
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));
printf("write(...) = %zd\n", write(fd, buffer, sizeof(buffer)));
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));
printf("fsync(...) = %d\n", fsync(fd));
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));

return EXIT_SUCCESS;
}
$ cc magic_shrinking_file.c
$ ./a.out
lseek(..., SEEK_END) = 9670656
write(...) = 8192
lseek(..., SEEK_END) = 9678848
fsync(...) = -1
lseek(..., SEEK_END) = 9670656

I got 2 information from above. a) before the fsync, the lseek(fd, 0,
SEEK_END)
returns a correct value, however after the fsync, it returns a wrong
value. b).
the fsync failed(return values is -1) in the above case. I'm more confused
because of point a). Since the fsync can't correct some wrong value, what
is the point to call fsync in this patch (I agree that it is good to fix
this
reliability problem within this patch, but I'm doubtful if fsync can help
in
this case). Am I missing something obviously?

I read your patch with details some days ago and feel it is in pretty good
shape.
and I think you are clear about the existing issues very well. like a).
smgr_alloc_sr use a
FIFO design. b). SR_PARTITION_LOCK doesn't use a partition lock really.
c). and
looks like you have some concern about the concurrency issue, which I
didn't find
anything wrong. d). how to handle the DIRTY sr during evict. I planned
to recheck
item c) before this reply, but looks like the time is not permitted. May I
know what
Is your plan about this patch?

[1]
https://www.postgresql.org/message-id/CA%2BhUKGLZTfKuXir9U4JQkY%3Dk3Tb6M_3toGrGOK_fa2d4MPQQ_w%40mail.gmail.com

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-12-28 04:31:48 Tying an object's ownership to datdba
Previous Message Bruce Momjian 2020-12-28 02:43:23 Re: pgsql: Add pg_alterckey utility to change the cluster key