Re: WIP: Avoid creation of the free space map for small tables

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Avoid creation of the free space map for small tables
Date: 2019-01-26 13:14:30
Message-ID: CAA4eK1K5J6yDTfwn6K=qa=Y_2NzAjZsCSFzsxAqdQ+_sAL2=zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 26, 2019 at 5:05 AM John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> wrote:
>
> On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> Performance testing is probably a good idea anyway, but I went ahead
> and implemented your next idea:
>
> > The other alternative is we can fetch pg_class.relpages and rely on
> > that to take this decision, but again if that is not updated, we might
> > take the wrong decision.
>
> We can think of it this way: Which is worse,
> 1. Transferring a FSM we don't need, or
> 2. Skipping a FSM we need
>
> I'd say #2 is worse.
>

Agreed.

> So, in v19 we check pg_class.relpages and if it's
> a heap and less than or equal the threshold we call stat on the 0th
> segment to verify.
>

Okay, but the way logic is implemented appears clumsy to me.

@@ -234,16 +243,40 @@ transfer_relfile(FileNameMap *map, const char
*type_suffix, bool vm_must_add_fro
{
/* File does not exist? That's OK, just return */
if (errno == ENOENT)
- return;
+ return first_seg_size;
else
- pg_fatal("error while checking for file existence \"%s.%s\" (\"%s\"
to \"%s\"): %s\n",
- map->nspname, map->relname, old_file, new_file,
- strerror(errno));
+ goto fatal;
}

/* If file is empty, just return */
if (statbuf.st_size == 0)
- return;
+ return first_seg_size;
+ }
+
+ /* Save size of the first segment of the main fork. */
+
+ else if (map->relpages <= HEAP_FSM_CREATION_THRESHOLD &&
+ (map->relkind == RELKIND_RELATION ||
+ map->relkind == RELKIND_TOASTVALUE))
+ {
+ /*
+ * In this case, if pg_class.relpages is wrong, it's possible
+ * that a FSM will be skipped when we actually need it. To guard
+ * against this, we verify the size of the first segment.
+ */
+ if (stat(old_file, &statbuf) != 0)
+ goto fatal;
+ else
+ first_seg_size = statbuf.st_size;
+ }
+ else
+ {
+ /*
+ * For indexes etc., we don't care if pg_class.relpages is wrong,
+ * since we always transfer their FSMs. For heaps, we might
+ * transfer a FSM when we don't need to, but this is harmless.
+ */
+ first_seg_size = Min(map->relpages, RELSEG_SIZE) * BLCKSZ;
}

The function transfer_relfile has no clue about skipping of FSM stuff,
but it contains comments about it. The check "if (map->relpages <=
HEAP_FSM_CREATION_THRESHOLD ..." will needlessly be executed for each
segment. I think there is some value in using the information from
this function to skip fsm files, but the code doesn't appear to fit
well, how about moving this check to new function
new_cluster_needs_fsm()?

> In the common case, the cost of the stat call is
> offset by not linking the FSM.
>

Agreed.

> Despite needing another pg_class field,
> I think this code is actually easier to read than my earlier versions.
>

Yeah, the code appears cleaner from the last version, but I think we
can do more in that regards.

One more minor comment:
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"SELECT all_rels.*, n.nspname, c.relname, "
- " c.relfilenode, c.reltablespace, %s "
+ " c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s "
"FROM (SELECT * FROM regular_heap "
" UNION ALL "
" SELECT * FROM toast_heap "
@@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
i_relname = PQfnumber(res, "relname");
i_relfilenode = PQfnumber(res, "relfilenode");
i_reltablespace = PQfnumber(res, "reltablespace");
+ i_relpages = PQfnumber(res, "relpages");
+ i_relkind = PQfnumber(res, "relkind");
i_spclocation = PQfnumber(res, "spclocation");

The order in which relkind and relpages is used in the above code is
different from the order in which it is mentioned in the query, it
won't matter, but keeping in order will make look code consistent. I
have made this and some more minor code adjustments in the attached
patch. If you like those, you can include them in the next version of
your patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v20-0003-During-pg_upgrade-conditionally-skip-transfer-of-FSM.patch application/octet-stream 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-26 14:57:10 Re: Thread-unsafe coding in ecpg
Previous Message Magnus Hagander 2019-01-26 12:45:46 Re: pg_basebackup, walreceiver and wal_sender_timeout