Re: Proposal: two new role attributes and/or capabilities?

From: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: 'Pg Hackers' <pgsql-hackers(at)postgresql(dot)org>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Subject: Re: Proposal: two new role attributes and/or capabilities?
Date: 2014-12-23 19:44:34
Message-ID: 5499C622.5040105@adv-solutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/23/2014 07:52 PM, Stephen Frost wrote:
> [snip]
> Manually performing VACUUM / VACUUM ANALYZE on the (few) affected
> tables every 12h or so fixes the performance problem for the
> particular queries without impacting the other users too much ---
> the tables and indexes in question have been moved to a separate
> tablespace/disk volume of their own.
> Autovacuum can certainly run vacuum/analyze on a few tables every 12
> hours, so I'm not really following where you see autovacuum being unable
> to cope. I agree that there *are* such cases, but getting more
> information about those cases and exactly what solution *does* work
> would really help us improve autovacuum to address those use-cases.

I'll try to. I don't have direct access, and the use case is quite edgy
to be fair.
Plus, the configuration and hardware leaves quite a bit to be desired...

...but it's a real use case and the solution (even if only treating the
symptoms) is quite straight-forward and easy.
>> In short, this addresses situations where some tables have a much
>> higher update rate than the rest of the database so that performance
>> degrades with time --- the application became unusable after about 6
>> days' worth of updates until the manual vacuums were setup
> This really looks like a configuration issue with autovacuum.. Perhaps
> you need to make it more aggressive than the default and have it run
> more threads?

Yes to both. Up to something which actually affected performance a bit.
But basically only a few tables exhibited this behaviour among several
hundreds in this particular situation.

> Have you turned the autovacuum logging up all the way?
> Is autovacuum giving up due to locking?
Not one of my systems, and I don't have access to it anymore, but I
don't think this was the reason.

However, having some hundred million deleted rows piling every few hours
quite increases the load. For the record, the (closed-source)
application did issue the DELETEs on the table, so partitioning +
TRUNCATE child_part was not applicable.

In any case, I was aiming at making this kind of operations possible and
easier --- regardless of whether they are solving the right problem or
not, or whether there exists an optimal solution --- since I have seen
some real life solutions that could benefit from it.
I agree that routine index maintenance is a better match for this
feature, though :)
>>>> REINDEX CONCURRENTLY (but not the blocking, regular, one)
>>>> REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)
>>> These are interesting, but would these make sense at the role level?
>>> Both of these commands explicitly take specific relations to operate
>>> against, after all.
>> Yup. Let's imagine a cron job invoking psql in order to perform
>> maintenance routine.
> If they make sense at a relation level then they should be
> relation-level GRANT'd permissions, not role-level attributes.

Same as before.
Let's imagine this coupled with REINDEX SCHEMA CONCURRENTLY ... or
simply when constructing the list of tables dynamically and there is no
other use for such a grant.
Arguably, this isn't that much of a problem if there exists a way to
easily revoke all such permissions from all objects in one go (just like
recently discussed in another thread)

>> The particular command(s) can be generated on-the-fly by querying
>> the catalog and then send them in one go to be run sequentially by
>> the one backend.... as a crude form of rate
>> limiting/quality-of-service of sorts ("renice -p" or even "ionice
>> -p" seems quite inadequate).
> This sounds like it's something that we might want an autovacuum-like
> background process to handle.. Some kind of auto-reindex-concurrently.
> There are already plans to deal with updating of materialized views, as
> I understand it.

While I can definitively see it for materialized views (they *are*
views, after all), this pattern potentially gets us adding everything
but the kitchen sink inside the database.
FWIW, it's only a matter of providing a mechanism for maintenance
routines to use very unprivileged users to perform their duties on the
whole cluster without having to explicitly grant permissions and/or
include these into another, regular, role.
Please keep in mind that these roles [having only LOGIN and
MAINTENANCE] would NOT be able to perform any DML or DDL whatsoever, nor
any queries (unless explicitly granted permission for SELECTs).

> [snip]
>> Yes. That's the reason for the question marks :-\
>> Some "dump to csv then load somewhere else" kind of jobs might
>> benefit from this feature, but I'm not sure the convenience is worth
>> the risk.
> I've run into quite a few processes which would really benefit from
> this, and would even be safe to use (the processes running the COPY
> commands don't have any rights on the directories except through PG),
> but it's not clear if that use-case is sufficiently broad for the
> feature to be worthwhile.. At least, some feel it isn't. Can you
> describe your use-case more and perhaps the needle will move on that
> point?

I don't have any concrete example for this right now, but a "psql -f
query_file.sql | processing_script | load_into_somewhere" scheduled to
run nightly (every few hours, even) is definitively quite common.
The COPY argument (which actually reads and outputs information)
actually falsifies the above assumptions, I'm afraid. So this task would
need explicit permissions anyway.

You seem to have better arguments in favor of this last feature, though.

> [snip]
> I suppose, but I think you're over-simplying your use-case. Consider
> this- do you want this login role to be a member of superuser roles?
> What about DBA or other privilege-user roles?

As pointed out by David Johnston, impersonating superusers shall be
explicitly forbidden.
The "noimpersonate" attribute (actually, we might need a better name to
avoid the implication that it means "can not impersonate others" but
"can not be impersonated") could easily block this loophole.

This feature is definitely orthogonal to the proposed "magical audit
role" proposal, which I heartily agree with.
We might need to tweak logging a bit under this circumstances to reveal
the login user vs the authorization user (quite like UID vs EUID for
processess), though.

Since the changes to pg_authid seem to be about to be reverted, I'll
wait a bit for a WIP patch --- this is definitively a lot more effort
without the bitmask and helper funcs :(
There existed another suggestion ---in the form of text-based
"capabilities" IIRC--- which might be used for this purpose, however.

Thanks,

J.L.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-23 19:52:29 Re: [COMMITTERS] pgsql: Use a bitmask to represent role attributes
Previous Message Stephen Frost 2014-12-23 19:36:33 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}