Re: Plan to support predicate push-down into subqueries with aggregates?

From: Adam Brusselback <adambrusselback(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Plan to support predicate push-down into subqueries with aggregates?
Date: 2016-03-10 12:09:18
Message-ID: CAMjNa7eWJeVoAnxRDKiO6FoB6_vUe_0nOUne5d8bUH6QBT8m-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Rob,
I understand that if I were to replicate the logic in that view for every
use case I had for those totals, this would not be an issue. But that would
very much complicate some of my queries to the point of absurdity if I
wanted to write them in a way which would push everything down properly.
The issue is, that I need that data to be able to join to that view from
other smaller sets of data and not have it calculate the totals for every
"header" I have in my system, just to throw away 99% of them.

My application is for contract and rebate management, so as David said,
basically accounting. We allow users to set up contracts to sell products
to their customers, and then track and verify the rebates they get back are
correct, and that they're not overpaying.
The equivalent of the header_total view is used in quite a few different
places in my application. In one case, the state of one object in my
system (called a deduction) is derived from 5 different data points, 2 of
which are from my equivalent of the "header total" view. Things like the
state for the deduction object are derived from a query that I encapsulated
inside a view. You can see how this proliferates.

In the end, after I switched to materialized tables with this data stored,
querying things that relied on this aggregated data got much much faster.
In my largest client's database, the query to get the deduction state for a
single deduction went from 5 seconds, down to 2ms or so. Unsurprisingly,
if I wanted the deduction state for every deduction the system, vs only
one, before it was 5 seconds either way, as it had to aggregate all of that
detail level data no matter what, and then throw most of it away if I
wanted just a single deduction.

I would very much rather not have to use the materialized tables to get
good performance, and just use views to get this data instead. I don't
like having to have so many triggers to control the materialized tables, it
leaves too much room for inconsistent data where as a view querying the
underlying data directly leaves no room for inconsistency.

I understand that not everyone has the same use case as I do, but I can see
from my tests that the "dark side" does seem to be able to optimize for
it. This doesn't seem like an uncommon use case to me (think of banking
software, with an account table, and transaction table, and having the
account balance derived in a view), and i'd love to see it supported.

I suppose there are two different ways this could go to improve my
situation: 1) better optimization for this type of query. 2) materialized
views getting more features like refreshing when the underlying data is
changed like detailed here
<https://wiki.postgresql.org/images/8/85/Materialised_Views_-_FOSDEM.pdf> (road
map part).
Either would be great in my book!

The feedback is very appreciated, I was just trying to see with this post
if there was any plan / ideas / anything at all in regards to this type of
use case (or better ways of doing it that I hadn't thought of).

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message david 2016-03-10 13:24:17 How to find configuration data in _PG_init()
Previous Message Giuseppe Sacco 2016-03-10 11:19:33 Re: How to setup Active Directory users in Postgres 9.3.11