| From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Safer hash table initialization macro |
| Date: | 2025-12-01 13:45:00 |
| Message-ID: | aS2b3LoUypW1/Gdz@ip-10-97-1-34.eu-west-3.compute.internal |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
Currently to create a hash table we do things like:
A) create a struct, say:
typedef struct SeenRelsEntry
{
Oid rel_id;
int list_index;
} SeenRelsEntry;
where the first member is the hash key, and then later:
B)
ctl.keysize = sizeof(Oid);
ctl.entrysize = sizeof(SeenRelsEntry);
ctl.hcxt = CurrentMemoryContext;
seen_rels = hash_create("find_all_inheritors temporary table",
32, /* start small and extend */
&ctl,
I can see 2 possible issues:
1)
We manually specify the type for keysize, which could become incorrect (from the
start) or if the key member's type changes.
2)
It may be possible to remove the key member without the compiler noticing it.
Take this example and remove:
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..eb11976afef 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -36,7 +36,6 @@
*/
typedef struct SeenRelsEntry
{
- Oid rel_id; /* relation oid */
int list_index; /* its position in output list(s) */
} SeenRelsEntry;
That would compile without any issues because this rel_id member is not
referenced in the code (for this particular example). That's rare but possible.
But then, on my machine, during make check:
TRAP: failed Assert("!found"), File: "nodeModifyTable.c", Line: 5157, PID: 140430
The reason is that the struct member access is done only for bytes level
operations (within the hash related macros). So it's easy to think that this
member is unused (because it is not referenced in the code).
I'm thinking about what kind of safety we could put in place to better deal with
1) and 2).
What about adding a macro that:
- requests the key member name
- ensures that it is at offset 0
- computes the key size based on the member
Something like:
"
#define HASH_ELEM_INIT(ctl, entrytype, keymember) \
do { \
StaticAssertStmt(offsetof(entrytype, keymember) == 0, \
#keymember " must be first member in " #entrytype); \
(ctl).keysize = sizeof(((entrytype *)0)->keymember); \
(ctl).entrysize = sizeof(entrytype); \
} while (0)
"
That way:
- The key member is explicitly referenced in the code (preventing "unused"
false positives)
- The key size is automatically computed from the actual member type (preventing
type mismatches)
- We enforce that the key is at offset 0
An additional benefit: it avoids repeating the "keysize =" followed by "entrysize ="
in a lot of places in the code (currently about 100 times).
If that sounds like a good idea, I could work on a patch doing so.
Thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Philipp Marek | 2025-12-01 13:55:27 | Re: [PATCH] Better Performance for PostgreSQL with large INSERTs |
| Previous Message | Pavel Stehule | 2025-12-01 13:40:08 | Re: Migrate to autoconf 2.72? |