Re: WIP: Avoid creation of the free space map for small tables

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Avoid creation of the free space map for small tables
Date: 2019-01-28 11:10:13
Message-ID: CAA4eK1L=qWp_bJ5aTc9+fy4Ewx2LPaLWY-RbR4a60g_rupCKnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 28, 2019 at 10:03 AM John Naylor
<john(dot)naylor(at)2ndquadrant(dot)com> wrote:
>
> On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > There are a few buildfarm failures due to this commit, see my email on
> > pgsql-committers. If you have time, you can also once look into
> > those.
>
> I didn't see anything in common with the configs of the failed
> members. None have a non-default BLCKSZ that I can see.
>

I have done an analysis of the different failures on buildfarm.

1.
@@ -26,7 +26,7 @@
pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
heap_size | fsm_size
-----------+----------
- 24576 | 0
+ 32768 | 0
(1 row)

-- Extend table with enough blocks to exceed the FSM threshold
@@ -56,7 +56,7 @@
SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
fsm_size
----------
- 16384
+ 24576
(1 row)

As discussed on another thread, this seems to be due to the reason
that a parallel auto-analyze doesn't allow vacuum to remove dead-row
versions. To fix this, I think we should avoid having a dependency on
vacuum to remove dead rows.

2.
@@ -15,13 +15,9 @@
SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
ERROR: block number 100 is out of range for relation "test_rel_forks"
SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
- fsm_0
--------
- 8192
-(1 row)
-
+ERROR: could not open file "base/50769/50798_fsm": No such file or directory
SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
-ERROR: block number 10 is out of range for relation "test_rel_forks"
+ERROR: could not open file "base/50769/50798_fsm": No such file or directory

This indicates that even though the Vacuum is executed, but the FSM
doesn't get created. This could be due to different BLCKSZ, but the
failed machines don't seem to have a non-default value of it. I am
not sure why this could happen, maybe we need to check once in the
failed regression database to see the size of relation?

3. Failure on 'mantid'
2019-01-28 00:13:55.191 EST [123979] 001_pgbench_with_server.pl LOG:
statement: CREATE UNLOGGED TABLE insert_tbl (id
serial primary key);
2019-01-28 00:13:55.218 EST [123982] 001_pgbench_with_server.pl LOG:
execute P0_0: INSERT INTO insert_tbl SELECT
FROM generate_series(1,1000);
2019-01-28 00:13:55.219 EST [123983] 001_pgbench_with_server.pl LOG:
execute P0_0: INSERT INTO insert_tbl SELECT
FROM generate_series(1,1000);
2019-01-28 00:13:55.220 EST [123984] 001_pgbench_with_server.pl LOG:
execute P0_0: INSERT INTO insert_tbl SELECT
FROM generate_series(1,1000);
..
..
TRAP: FailedAssertion("!((rel->rd_rel->relkind == 'r' ||
rel->rd_rel->relkind == 't') && fsm_local_map.map[oldPage] == 0x01)",
File: "freespace.c", Line: 223)

I think this can happen if we forget to clear the local map after we
get the block with space in function RelationGetBufferForTuple(). I
see the race condition in the code where that can happen. Say, we
tried all the blocks in the local map and then tried to extend the
relation and we didn't get ConditionalLockRelationForExtension, in the
meantime, another backend has extended the relation and updated the
FSM (via RelationAddExtraBlocks). Now, when the backend that didn't
get the extension lock will get the target block from FSM which will
be greater than HEAP_FSM_CREATION_THRESHOLD. Next, it will find that
the block can be used to insert a new row and return the buffer, but
won't clear the local map due to below condition in code:

@@ -377,20 +383,9 @@ RelationGetBufferForTuple(Relation relation, Size len,
+
+ /*
+ * In case we used an in-memory map of available blocks, reset it
+ * for next use.
+ */
+ if (targetBlock < HEAP_FSM_CREATION_THRESHOLD)
+ FSMClearLocalMap();
+

I think here you need to clear the map if it exists or clear it
unconditionally, the earlier one would be better.

This test gets executed concurrently by 5 clients, so it can hit the
above race condition.

4. Failure on jacana:
--- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out
2018-09-26
17:53:33 -0400
+++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out
2019-01-27 23:14:35
-0500
@@ -252,332 +252,7 @@
('(0,100)(0,infinity)'),
('(-infinity,0)(0,infinity)'),
('(-infinity,-infinity)(infinity,infinity)');
-SET enable_seqscan = false;
-SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)';
..
..
TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
"c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c",
Line:
1118)
..
2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG: server process
(PID 14388) exited with exit code 3
2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL: Failed process
was running: INSERT INTO box_temp
VALUES (NULL),

I think the reason for this failure is same as previous (as mentioned
in point-3), but this can happen in a different way. Say, we have
searched the local map and then try to extend a relation 'X' and in
the meantime, another backend has extended such that it creates FSM.
Now, we will reuse that page and won't clear local map. Now, say we
try to insert in relation 'Y' which doesn't have FSM. It will try to
set the local map and will find that it already exists, so will fail.
Now, the question is how it can happen in this box.sql test. I guess
that is happening for some system table which is being populated by
Create Index statement executed just before the failing Insert.

I think both 3 and 4 are timing issues, so we didn't got in our local
regression runs.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-01-28 11:27:27 Re: Delay locking partitions during query execution
Previous Message Etsuro Fujita 2019-01-28 10:37:48 Re: postgres_fdw: estimate_path_cost_size fails to re-use cached costs