Re: pgsql: Avoid creation of the free space map for small heap relations, t

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Amit Kapila <akapila(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Avoid creation of the free space map for small heap relations, t
Date: 2019-02-26 09:28:22
Message-ID: CAA4eK1JtJp-DE6QWMHDdZoU5OJEJssheJ0ZDZSim0MzQv6tYpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Feb 25, 2019 at 10:32 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Kapila <akapila(at)postgresql(dot)org> writes:
> > Avoid creation of the free space map for small heap relations, take 2.
>
> I think this patch still has some issues. Note the following two
> recent buildfarm failures:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2019-02-20%2004%3A20%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2019-02-24%2022%3A48%3A02
>
> Both of them failed in an assertion added by this patch:
>
> TRAP: FailedAssertion("!((rel->rd_rel->relkind == 'r' || rel->rd_rel->relkind == 't') && fsm_local_map.map[oldPage] == 0x01)", File: "/home/bf/build/buildfarm-petalura/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c", Line: 229)
> ...
> 2019-02-20 06:23:26.660 CET [5c6ce440.3bd2:4] LOG: server process (PID 17981) was terminated by signal 6: Aborted
> 2019-02-20 06:23:26.660 CET [5c6ce440.3bd2:5] DETAIL: Failed process was running: INSERT INTO T VALUES ( $1 )
>
> I can even offer you a theory as to why we're managing to get through all
> the core regression tests and then failing in ecpg: the failing test is
> ecpg/test/thread/prep.pgc, and I think that may be the only place in our
> tests where we stress concurrent insertions into a single small table.
> So I think you've got race-condition issues here.
>

Right, I think I have some theory what is going on here. There seems
to be a narrow window during which this problem can occur (with
concurrent insertions) where we will try to use the local map for
block which doesn't exist in that map.

During insertion (say by session-1) once we have exhausted all the
pages in local map and didn't find available space, we try to extend
the relation. Note, by this time we haven't cleared the local map.
Now, say we didn't got the conditional relation extension lock (in
function RelationGetBufferForTuple, line 561) and before we get the
relation extension lock, the other backend (say session-2) acquired it
and extended the relation in bulk (via RelationAddExtraBlocks) and
updated the FSM (as the total number of blocks exceeds
HEAP_FSM_CREATION_THRESHOLD). After that, session-1 got the lock and
found the page from FSM which has some freespace. Then before it tries
to validate whether the page has enough space to insert a new tuple,
session-2 again inserted few more rows such that the page which
session-1 has got from FSM doesn't have free space. After that,
session-1 will try to record the freespace for this page and find a
new page from FSM, however as our local map is still not cleared, it
will hit the assertion shown above.

Based on above theory, I have tried to reproduce this problem via
debugger and here are the steps, I have followed.

Session-1
-----------------
1. Create table fsm_1(c1 int, c2 char(1500));
2. Attach debugger, set breakpoint on line no. 535 in hio.c ( aka
targetBlock = RecordAndGetPageWithFreeSpace(relation,)
3. insert into fsm_1 values(generate_series(1,10),'aaaaa');
4. Now, when you debug the code at line 535, you must get
InvalidBlockNumber and you can check that local map will still exist
(value of fsm_local_map.nblocks must be greater than 0).
5. Debug till line 561 (else if
(!ConditionalLockRelationForExtension(relation, ExclusiveLock))).

Session-2
------------------
1. Attach debugger, set breakpoint on line no. 599 in hio.c (aka
buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);)
2. insert into fsm_1 values(generate_series(11,37),'aaaaa');
3. Now leave the debugger of this session at this stage, this ensures
that Session-2 has RelationExtensionLock

Session-1
----------------
6. Step over line 561.
7. Before acquiring RelationExtension Lock in in line 564
(LockRelationForExtension(relation, ExclusiveLock);), allow Session-2
to complete.

Session-2
---------------
4. continue
5. vacuum fsm_1; --This will allow to create FSM for this table.

Session-1
----------------
8. Step through debugger, you will get the block with freespace from
FSM via GetPageWithFreeSpace.
9. Now, it will allow to try that block (via goto loop).
10. Let the debugger in the same state and allow Session-2 to execute
some statements so that the space in this block is used up.

Session-2
----------------
6. insert into fsm_1 values(generate_series(41,50),'aaaaa');

Session-1
----------------
11. Step through debugger and it should hit Assertion in
RecordAndGetPageWithFreeSpace (line 227, freespace.c)

I am looking at code as of commit -
2ab23445bc6af517bddb40d8429146d8ff8d7ff4. I have also tried to
reproduce it via the ecpg test and my colleague Kuntal Ghosh has
helped me to run that ecpg test by varying THREADS and REPEATS in
ecpg/test/thread/prep.pgc on different machines, but no luck till now.

To fix this symptom, we can ensure that once we didn't get any block
from local map, we must clear it. See the attached patch. I will try
to evaluate this code path to see if there is any similar race
condition and will also try to generate a reproducer. I don't have
any great idea to write a reproducer for this issue apart from trying
some thing similar to ecpg/test/thread/prep.pgc, if you can think of
any, I am all ears.

John, others, can you review my findings and patch?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_loc_map_clear_1.patch application/octet-stream 537 bytes

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Meskes 2019-02-26 09:58:07 pgsql: Hopefully fixing memory handling issues in ecpglib that Coverity
Previous Message Michael Paquier 2019-02-26 07:15:47 pgsql: Simplify some code in pg_rewind when syncing target directory

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2019-02-26 09:41:02 pgbench MAX_ARGS
Previous Message Michael Meskes 2019-02-26 09:09:36 Re: SQL statement PREPARE does not work in ECPG