Re: SIGSEGV in BRIN autosummarize

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SIGSEGV in BRIN autosummarize
Date: 2017-10-17 16:49:55
Message-ID: 21883.1508258995@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> cur_datname here seems corrupted -- it points halfway into cur_nspname,
> which is also a corrupt value.

Yeah.

> And I think that's because we're not
> checking that the namespace OID is a valid value before calling
> get_namespace_name on it.

No, because get_namespace_name should return NULL if it doesn't find
any pg_namespace entry. That would work just as well for InvalidOid
as for any other nonexistent schema OID.

The part of your patch that adds a check on avw_database is clearly
correct and necessary. I'm thinking the change you propose in
perform_work_item is just overcomplicating code that's okay as it
stands. We don't need to optimize for the schema-went-away case.

What I'm suspicious of as the actual bug cause is the comment in
perform_work_item about how we need to be sure that we're allocating these
strings in a long-lived context. If, in fact, they were allocated in some
context that could get reset during the PG_TRY (particularly in the
error-cleanup path, which I bet we don't test), then the observed symptom
that the pointers seem to be pointing at garbage could be explained.

So what I'm thinking is that you need an error during perform_work_item,
and/or more than one work_item picked up in the calling loop, to make this
bug manifest. You would need to enter perform_work_item in a
non-long-lived context, so the first time through the loop is probably
safe anyway.

BTW, it seems like the "Perform operations on collected tables." loop in
do_autovacuum has probably got similar latent bugs. We take care to enter
it in AutovacMemCxt initially, but it looks to me like subsequent
iterations probably start out in some transaction context, because the
PG_TRY around autovacuum_do_vac_analyze doesn't do anything about
switching back to AutovacMemCxt. There needs to be a bit more clarity
throughout this code about what CurrentMemoryContext ought to be at each
point. Appropriate fixes might involve switching back to AutovacMemCxt
after each of those PG_TRY blocks.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-10-17 16:52:11 Re: [COMMITTERS] pgsql: Implement table partitioning.
Previous Message Alvaro Herrera 2017-10-17 16:32:03 Re: [COMMITTERS] pgsql: Implement table partitioning.