Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
Cc: pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Limiting the operations that client-side code can perform upon its database backend's artifacts
Date: 2022-09-30 03:24:59
Message-ID: F0A23614-749D-4A89-84C5-119D4000F9FE@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>
>> bryn(at)yugabyte(dot)com writes:
>>
>>> rjuju123(at)gmail(dot)com wrote:
>>>
>>> I'm not convinced... that the authorization system can prevent an untrusted user with a direct SQL access from actually hurting you.
>>
>> What do you mean by "untrusted"? Any person who is given the credentials to start a database session is trusted—even a person who can connect as a superuser and do untold harm. So focus on a person who has the credentials to connect as "client" in my example. But imagine a design that exposes functionality to "client" sessions exclusively through a carefully designed and implemented API that's expressed exclusively with user-defined functions and procedures.
>
> Sure. That is called an application server. What we are is a SQL server, and that means that the API is SQL commands, and the authorization model is what the SQL spec says it is. [So]... any session authenticated as user X has the same privileges as any other session authenticated as user X, so there is not a lot of point in user X mounting defenses against user X. So I think the concerns you're expressing here would be better addressed at the next level up the stack.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Julien replied to the same message from me that Tom did thus:

> rjuju123(at)gmail(dot)com wrote
>
> You mean like if the application takes care of checking that the logged-in user is allowed to insert data based on whatever application defined rules / user profile, while the SQL role can simply insert data and/or call the carefully written functions? Yes the data will be consistent, but if your role just transferred money from an account to another that's not really the problem.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Peter Holzer replied to an earlier message fro be in this branching thread thus:

> hjp-pgsql(at)hjp(dot)at wrote:
>
> I think you did [misunderstand Tom]. What he was saying was that a database design which uses only one single role which both owns all the objects and executes all code on behalf of the user can not be secure. At the very least you need two roles: One which owns the objects and one which can only use the objects in a way allowed by the business logic and is not allowed to change that logic (This is the case you demonstrated.)

Yes, indeed it is. That was my aim.

> In many cases this should be even more fine-grained, and at the extreme end every user could actually have several roles, each with only the minimal privileges required.
>
> (I have often observed that we tend to throw away and build permission systems at every layer of the stack: The OS has a system of users and permissions. But the database server runs as a single user (postgres) which has access to all the data files. So it has to implement its own system of roles and permissions. Then an application developer comes along and writes an app which uses a single database role which has access to all the data.

It often is done like this. But it doesn't have to be done this way. This is where a user-defined within-RDBMS functions and procedures, and so-called "run authority" in the wider discourse, come to the rescue.

> So it again has to implement its own system of users and permissions...)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

*SUMMARY*

I'd better simply attempt to stop further discussion in this thread by saying that no mutual understanding has been reached. I'm sad about this outcome. But it would seem to be unfair to invite anybody to spend any more time on these exchanges.

*MORE... ONLY IF YOU CAN BEAR TO READ IT*

First, lest a really awful misunderstanding lingers on, I must stress that, about this from Tom:

> any session authenticated as user X has the same privileges as any other session authenticated as user X

well, yes. That statement of outcome is a tautology. I don't think that I wrote anything to say that I thought otherwise. About the notion that Ms. Jones, authorized as X can kill Ms. Smith's session authorized as X, and vice versa... this seems now to be just a red herring. Either the designer wants this or they don't. And if they don't, they can prevent it by a simple revocation of one object privilege. It's been argued that some other evil doing can't be prevented. Well, fair enough. But this doesn't seem to justify not preventing any bad thing that *can* be prevented.

There was an era, now long gone, when an outfit had its own production RDBMS (or a small few of these) in a locked and guarded room in the very same building within which all human end users of such a system worked. In those days, each human user was mapped one-to-one to its own within-RDBMS role. And the application under the end-users fingertips connected directly to the RDBMS by authenticating as that person-specific RDBMS role. (There'd also be some RDBMS roles for human admins and developers.) Then, no two end-user sessions would authorize as the same role, and native RDBMS security notions were relied upon to protect in the way that's come up for discussion here.

We're no longer in that era. Client-server is dead. Long-live multi-tier. I have a few bank accounts. And I also do Internet shopping. I went through the usual steps to set myself up to use each of these. And I have no choice but to put my faith in their authorization and security schemes—notwithstanding periodic troubling revelations of security loopholes. I've also discussed such systems with end-user organizations as part of the duties of my former job. Of course, these end-user organizations were self-selected for me because they all used an RDBMS as the database of record at the bottom of it all.

None of these end-user organizations mapped human end users each to their own RDBMS role. We all know this. And we all know the reasons. The concept of connection pooling, where all connections in the pool authorize as the same RDBMS role is just one reason among many. I've called this connection-pool-role "client"—and I'll continue to do so here.

This pre-amble sets the stage for the discussion about the overall security proposition when two sessions authorize as the same database role "client" (by all means, not even concurrently) but on behalf of different human end-users. I've not had the chance yet to have even a single face-to-face discussion with someone who knows what approaches are used to bring the required isolation between human end-users when the database of record is PostgreSQL. And I'd very much like to read some nicely-written essays about this topic. Can anybody recommend any such pieces for me to study?

Meanwhile, I do understand the general principles. Software tiers that stand above the RDBMS tier in the overall stack take care of authorizing a human being so that the SQL call that the application code tier passes through the connection pooling tier on behalf of the human user also passes a unique ID for the human. The SQL execution therefore knows whom it's acting for—and can look up salient facts about this principal in table(s) that have been designed and populated for that purpose. The details are unimportant here. We have no end of existence proofs that such schemes are in daily use and are judged to be adequately robust. (For example, Salesforce.com have talked and written publicly about how they do application-level multitenancy within a single RDBMS by striping the critical tables with an ID for the tenant organization. Then they model human users within tenant organizations on top of that.)

(Yes, Peter, I appreciate that this is exactly what you said! But it's hardly a bad thing. It's just an inevitable fact of life.)

Paraphrasing Peter, the design of the application's RDBMS backend has to implement its own notions of roles and privileges as a new layer on top of whatever the native RDBMS mechanisms provide. Some RDBMSs have native primitives that help the implementation of this next, application-specific, roles-and-privileges regime. I've yet to discover what PG has in this space. But one thing is clear. The SQLs that rely on these home-grown designs for the application-specific privilege regime must be designed, implemented, and tested very carefully by specialists. In my book, these specialists have enough to grapple with without embracing the skills of client-side programming languages, the declarative mechanisms that various frameworks rely on, and UI design. It's nice that PG has the mechanisms to support this separation of duties.

All this background lets me focus on what I started with: how to make use of PG's native mechanisms for security and (within this framework) "security definer" subprograms to limit the scope for what SQL calls from next-tier code that's authorized as "client" can do. The design spectrum here spans from this:

—Make no use of the RDBMS's mechanisms: in other words, implement the entire RDBMS application backend using only the single role "client" where client owns, crudely stated, only tables. Of course, there are closely associated artifacts like indexes and constraints—even though (as the myth has it) these cause performance problems and, as is claimed, such rules are better implemented in application-tier code. I kid you not. I've heard people say this. And I expect that you all have too. In this case, the engineers who write the application-tier code (or use schemes that generate it) take all the responsibility for correctness, data integrity, and data security. I call this approach the "bag of tables paradigm". And, yes, the phrase is meant to connote nastiness.

—Make maximal use of the RDBMS's mechanisms: in other words, exploit PG's role-based security mechanisms, and especially "security definer" subprograms, so that application-tier code (acting as it must, by having authorized as "client" to service requests for all human end-users) can most reliably implement whatever scheme is invented for the within-RDBMS component of the application-level notions for roles and privileges. I call this approach the "hard shell paradigm". (I didn't invent that term of art.) The phrase is meant to connote goodness.

You've noticed that I stand at the "hard shell" end of the spectrum. For example, I have a demo application that deals with data held in a master-detail table pair. (Please don't tell me, Julien, that this is such an unrealistic toy that it makes not a single pedagogic point.) The API is exposed via the "client" role as just a set of JSON-in, JSON-out procedures. And there's many roles where often each owns more than one schema, hidden behind the scenes. Sessions authorized as "client" can't detect any facts about these implementation details because they can't query any of the catalog relations. After all, why should they be able to do this? I see no need for it. And it's easy to prevent it. Moreover, and very significantly, it's hugely more difficult to allow catalog access and then to reason, case by case, about what "client" can see (like the source code of a procedure) can't be exploited to do harm.

Here's some examples of API calls with their return values. Each of the "masters" and the "details" tables has just a single "payload" column, "v text" with some constraints. The other columns are the master-detail plumbing and don't surface into the business functionality world.

Firstly, inserting data:

call insert_master_and_details()

{"m": "mary", "ds": ["shampoo", "soap", "toothbrush", "towel"]}

→ {"status": "success"}

{"m": "arthur", "ds": ["scissors", "saucer", "spatula", "spatula", "scissors"]}

→ {"reason": "New master 'arthur' bad duplicate details: 'scissors','spatula',", "status": "user error"}

{"m": "Christopher", "ds": []}

→ {"status": "unexpected error", "ticket": 1}

The "Christopher" error reflects the fact that a rule (like names must be all lower case) is violated. The API contract says that the database will reject bad values—and will not coerce a bad value into a good value. Rather, it's the application-tier's responsibility to present only good values. That's why the status is "unexpected error". The ticket number is the auto-generated surrogate key for a row in the behind-the-scenes "incidents" table. An authorized support staff member can access that table for that ticket and see this:

unit: procedure code.insert_master_and_details(text, text)
returned_sqlstate: 23514
message_text: new row for relation "masters" violates check constraint "masters_v_chk"
pg_exception_detail: Failing row contains (cc93bd34-b68a-4d47-b9e9-0033031cefb7, Christopher).
constraint_name: masters_v_chk
table_name: masters
schema_name: data

pg_exception_context
--------------------
SQL statement "insert into data.masters(v) values(m_and_ds.m) returning mk"
PL/pgSQL function code.insert_master_and_details(text,text) line 17 at SQL statement
SQL statement "call code.insert_master_and_details(j, outcome)"
PL/pgSQL function insert_master_and_details(text,text) line 3 at CALL

This is the full story of what "get stacked diagnostics" delivers in an "others" handler—together with whatever other facts about the context that the designer as decided might be useful (exemplified here by the redundant fact "unit").

Secondly, reporting on data:

call do_master_and_details_report()

{"key": "mary"}

→ {"status": "m-and-ds report success",
"m_and_ds": {"m": "mary", "ds": ["shampoo", "soap", "toothbrush", "towel"]}}

{"key": "bill"}

→ {"reason": "The master business key 'bill' doesn't exist", "status": "user error"}

{"ket": "fred"}

→ {"reason": "Bad JSON in: {\"ket\": \"fred\"}", "status": "client code error"}

Obviously, it's the responsibility of the application-tier code to manage the UI flow, paint panes that allow data entry for new facts and for parameterizing queries, and for presenting the return values for the good outcomes and the error outcomes sensibly.

For sport, I have a variant of this demo that enforces the famous mandatory 1:M rule. I used an approach that you can find described on the Internet. The concept is a mutual FK relationship between the "masters" and the "details" tables so that one detail is special in that it's the parent of its own parent. This works perfectly well w.r.t. the rule itself—and I don't need to worry about race conditions because the native support for deferred FK constraints looks after this. There is a twist, though. I have to do a "twizzle" when there remain many details for a given master and when somebody wants to delete the currently "special" detail. But the solution is doable by using a trigger to perform the twizzle to appoint a surviving detail to "special" status.

Significantly, the details of the enforcement scheme don't matter to sessions that connect as "client". To prove that point, I implemented an alternative scheme that uses a trigger naïvely to cause an error when it sees that you're about to leave a master row with no details. This scheme does, of course, require "serializable" isolation—and so I prefer the mutual FK scheme.

My point here is that the design and implementation of the mandatory 1:M rule enforcement, whatever it is, is entirely hidden from "client" sessions and is immune from their tampering attempts. Moreover, you can change the implementation, in an application upgrade exercise, without "client" sessions even knowing that this was done. (I like to dream that, one day, it could be replaced with a single declaration of intent, expressed as a "create assertion" statement.)

I regard this feature of the scheme (changing the implementation without needing to tell clients) as a good thing. It's something that the "bag of tables" paradigm cannot, in general, support. When you describe the paradigm as "hiding all implementations behind an API that articulates the business purpose of each of its members", you see the "hard shell" paradigm for what it is. One of the central pillars of software engineering. And it's as old as the discipline itself.

*FINALLY*

I've discussed these competing paradigms, over many years, with all sorts of people—both face-to-face and in written exchanges. I'm a "hard shell" devotee. And there are lots of "hard shell" devotees out there. But there are also many "bag of tables" devotees out there too. The second camp is probably more populous than the first one. I've come to accept that one's affiliation here is religion. In almost all cases, a devotee of one paradigm simply cannot convince devotees of the other paradigm to change their minds—or even to concede ground on just one tiny detail. I can't fix this. But I know where I stand in the debate.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lahnov, Igor 2022-09-30 06:17:39 RE: Streaming wal from primiry terminating
Previous Message Rickson Marshall 2022-09-30 01:36:34 Postgresql database and application server compatibility