| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com> | 
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Reducing the chunk header sizes on all memory context types | 
| Date: | 2022-10-08 15:52:25 | 
| Message-ID: | 3578387.1665244345@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
So I pushed that, but I don't feel that we're out of the woods yet.
As I mentioned at [1], while testing this stuff I hit a case where
aset.c will try to wipe_mem practically the entire address space after
being asked to pfree() an invalid pointer.  The specific reproducer
isn't too interesting because it depends on the pre-80ef92675 state of
the code, but what I take away from this is that aset.c is still far
too fragile as it stands.
One problem is that aset.c generally isn't too careful about whether
a putative chunk is actually one of its chunks.  That was okay before
c6e0fe1f2 because we would never get to AllocSetFree etc unless the
word before the chunk data pointed at a moderately-sane AllocSet.
Now, we can arrive at that code on the strength of three random bits,
so it's got to be more careful.  In the attached patch, I make
AllocSetIsValid do an IsA() test, and make sure to apply it to the
aset we think we have found from the chunk header.  This is not in
any way a new check: what it is doing is replacing the IsA check done
by the "AssertArg(MemoryContextIsValid(context))" that was performed
by GetMemoryChunkContext in the old code, and that we've completely
lost in the code as it stands.
The other problem, which is what is leading to wipe_mem-the-universe,
is that aset.c figures the size of a non-external chunk essentially
as "1 << MemoryChunkGetValue(chunk)", where the "value" field is 30
bits wide and has undergone exactly zero validation before
AllocSetFree uses the size in memset.  That's far, far too trusting.
In the attached I put in some asserts to verify that the value field
is in the valid range for a freelist index, which should usually
trigger if we have a garbage value, or at the very least constrain
the damage.
What I am mainly wondering about at this point is whether Asserts
are good enough or we should use actual test-and-elog checks for
these things.  The precedent of the old GetMemoryChunkContext
implementation suggests that assertions are good enough for the
AllocSetIsValid tests.  On the other hand, there are existing
cross-checks like
        if (block->freeptr != block->endptr)
            elog(ERROR, "could not find block containing chunk %p", chunk);
so at least in some code paths we've thought it is worth expending
such tests in production builds.  Any opinions?
I imagine generation.c and slab.c need similar bulletproofing
measures, but I didn't look at them yet.
regards, tom lane
[1] https://www.postgresql.org/message-id/3436789.1665187055%40sss.pgh.pa.us
| Attachment | Content-Type | Size | 
|---|---|---|
| harden-aset-some-more-1.patch | text/x-diff | 6.8 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2022-10-08 16:53:50 | Re: START_REPLICATION SLOT causing a crash in an assert build | 
| Previous Message | Robert Haas | 2022-10-08 15:46:50 | Re: use has_privs_of_role() for pg_hba.conf |