Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-08-20 02:30:27
Message-ID: 50F6468F-5A5E-412A-86EC-DF24B549D733@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 16, 2020, at 9:37 PM, Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> In addition to this, I found a few more things while reading v13 patch are as
> below:
>
> Patch v13-0001:
>
> -
> +#include "amcheck.h"
>
> Not in correct order.

Fixed.

> +typedef struct BtreeCheckContext
> +{
> + TupleDesc tupdesc;
> + Tuplestorestate *tupstore;
> + bool is_corrupt;
> + bool on_error_stop;
> +} BtreeCheckContext;
>
> Unnecessary spaces/tabs between } and BtreeCheckContext.

This refers to a change in verify_nbtree.c that has been removed. Per discussions with Peter and Robert, I have simply withdrawn that portion of the patch.

> static void bt_index_check_internal(Oid indrelid, bool parentcheck,
> - bool heapallindexed, bool rootdescend);
> + bool heapallindexed, bool rootdescend,
> + BtreeCheckContext * ctx);
>
> Unnecessary space between * and ctx. The same changes needed for other places as
> well.

Same as above. The changes to verify_nbtree.c have been withdrawn.

> ---
>
> Patch v13-0002:
>
> +-- partitioned tables (the parent ones) don't have visibility maps
> +create table test_partitioned (a int, b text default repeat('x', 5000))
> + partition by list (a);
> +-- these should all fail
> +select * from verify_heapam('test_partitioned',
> + on_error_stop := false,
> + skip := NULL,
> + startblock := NULL,
> + endblock := NULL);
> +ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
> +create table test_partition partition of test_partitioned for values in (1);
> +create index test_index on test_partition (a);
>
> Can't we make it work? If the input is partitioned, I think we could
> collect all its leaf partitions and process them one by one. Thoughts?

I was following the example from pg_visibility. I haven't thought about your proposal enough to have much opinion as yet, except that if we do this for pg_amcheck we should do likewise to pg_visibility, for consistency of the user interface.

> + ctx->chunkno++;
>
> Instead of incrementing in check_toast_tuple(), I think incrementing should
> happen at the caller -- just after check_toast_tuple() call.

I agree.

> ---
>
> Patch v13-0003:
>
> + resetPQExpBuffer(query);
> + destroyPQExpBuffer(query);
>
> resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer().

Thanks. I removed it in cases where destroyPQExpBuffer is obviously the very next call.

> + appendPQExpBuffer(query,
> + "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
> + "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
> + "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
> + "skip := %s, startblock := %s, endblock := %s) v, "
> + "pg_class c"
> + "\nWHERE c.oid = %u",
> + tbloid, stop, skip, settings.startblock,
> + settings.endblock, tbloid
>
> pg_class should be schema-qualified like elsewhere.

Agreed, and changed.

> IIUC, pg_class is meant to
> get relname only, instead, we could use '%u'::pg_catalog.regclass in the target
> list for the relname. Thoughts?

get_table_check_list() creates the list of all tables to be checked, which check_tables() then iterates over, calling check_table() for each one. I think some verification that the table still exists is in order. Using '%u'::pg_catalog.regclass for a table that has since been dropped would pass in the old table Oid and draw an error of the 'ERROR: could not open relation with OID 36311' variety, whereas the current coding will just skip the dropped table.

> Also I think we should skip '\n' from the query string (see appendPQExpBuffer()
> in pg_dump.c)

I'm not sure I understand. pg_dump.c uses "\n" in query strings it passes to appendPQExpBuffer(), in a manner very similar to what this patch does.

> + appendPQExpBuffer(query,
> + "SELECT i.indexrelid"
> + "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
> + "\nWHERE i.indexrelid = c.oid"
> + "\n AND c.relam = %u"
> + "\n AND i.indrelid = %u",
> + BTREE_AM_OID, tbloid);
> +
> + ExecuteSqlStatement("RESET search_path");
> + res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
> + PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
>
> I don't think we need the search_path query. The main query doesn't have any
> dependencies on it. Same is in check_indexes(), check_index (),
> expand_table_name_patterns() & get_table_check_list().
> Correct me if I am missing something.

Right.

> + output = PageOutput(lines + 2, NULL);
> + for (lineno = 0; usage_text[lineno]; lineno++)
> + fprintf(output, "%s\n", usage_text[lineno]);
> + fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
> + fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
>
> I am not sure why we want PageOutput() if the second argument is always going to
> be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ?
> e.g. usage() function in pg_basebackup.c.

Done.

Please find attached the next version of the patch. In addition to your review comments (above), I have made changes in response to Peter and Robert's review comments upthread.

Attachment Content-Type Size
v14-0001-Adding-function-verify_heapam-to-amcheck-module.patch application/octet-stream 78.9 KB
v14-0002-Adding-contrib-module-pg_amcheck.patch application/octet-stream 82.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 胡常齐 2020-08-20 02:49:46 The number of logical replication slot or wal sender recommend
Previous Message Kyotaro Horiguchi 2020-08-20 02:29:46 Re: SyncRepLock acquired exclusively in default configuration