回复:Re: Re: Cache relation sizes?

From: 陈佳昕(步真) <buzhen(dot)cjx(at)alibaba-inc(dot)com>
To: "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>
Cc: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Konstantin Knizhnik" <k(dot)knizhnik(at)postgrespro(dot)ru>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: 回复:Re: Re: Cache relation sizes?
Date: 2020-12-29 15:13:12
Message-ID: 3d5f151d-217d-46ba-9432-8c40a8dc1a96.buzhen.cjx@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas
I found some other problems which I want to share my change with you to make you confirm.
<1> I changed the code in smgr_alloc_sr to avoid dead lock.

LWLockAcquire(mapping_lock, LW_EXCLUSIVE);
flags = smgr_lock_sr(sr);
Assert(flags & SR_SYNCING(forknum));
+ flags &= ~SR_SYNCING(forknum);
if (flags & SR_JUST_DIRTIED(forknum))
{
/*
* Someone else dirtied it while we were syncing, so we can't mark
* it clean. Let's give up on this SR and go around again.
*/
smgr_unlock_sr(sr, flags);
LWLockRelease(mapping_lock);
goto retry;
}
/* This fork is clean! */
- flags &= ~SR_SYNCING(forknum);
flags &= ~SR_DIRTY(forknum);
}

In smgr_drop_sr, if the sr is SR_SYNCING, it will retry until the sr is not SR_SYNCING. But in smgr_alloc_sr, if the sr is SR_JUST_DIRTIED, it will retry to get another sr, although it has been synced by smgrimmedsync, the flag SR_SYNCING doesn't changed. This might cause dead lock. So I changed your patch as above.

<2> I changed the code in smgr_drop_sr to avoid some corner cases
/* Mark it invalid and drop the mapping. */
smgr_unlock_sr(sr, ~SR_VALID);
+ for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
+ sr->nblocks[forknum] = InvalidBlockNumber;
hash_search_with_hash_value(sr_mapping_table,
&reln->smgr_rnode,
hash,
HASH_REMOVE,
NULL);

smgr_drop_sr just removes the hash entry from sr_mapping_table, but doesn't remove the sr_pool. But in smgrnblocks_fast, it just get sr from sr_pool, so I add some codes as above to avoid some corner cases get an unexpected result from smgrnblocks_fast. Is it necessary, I also want some advice from you.

Thanks a lot.
Buzhen

------------------原始邮件 ------------------
发件人:Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
发送时间:Tue Dec 29 22:52:59 2020
收件人:陈佳昕(步真) <buzhen(dot)cjx(at)alibaba-inc(dot)com>
抄送:Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
主题:Re: Re: Cache relation sizes?
On Wed, Dec 23, 2020 at 1:31 AM 陈佳昕(步真) <buzhen(dot)cjx(at)alibaba-inc(dot)com> wrote:
> I studied your patch these days and found there might be a problem.
> When execute 'drop database', the smgr shared pool will not be removed because of no call 'smgr_drop_sr'. Function 'dropdb' in dbcommands.c remove the buffer from bufferpool and unlink the real files by 'rmtree', It doesn't call smgrdounlinkall, so the smgr shared cache will not be dropped although the table has been removed. This will cause some errors when smgr_alloc_str -> smgropen、smgrimmedsync. Table file has been removed, so smgropen and smgrimmedsync will get a unexpected result.

Hi Buzhen,

Thanks, you're right -- it needs to scan the pool of SRs and forget
everything from the database you're dropping.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-12-29 15:20:43 Re: Weird failure in explain.out with OpenBSD
Previous Message Tom Lane 2020-12-29 14:36:21 Re: doc review for v14