Re: Multitenancy optimization

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multitenancy optimization
Date: 2019-03-29 08:42:55
Message-ID: 40646a68-fe57-0a3d-5c05-a2c13d2e676f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.03.2019 11:06, Hadi Moshayedi wrote:
> On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
> Certainly it is possible to create multicolumn statistics to notify
> Postgres about columns correlation.
> But unfortunately it is not good and working solution.
>
> First of all we have to create multicolumn statistic for all possible
> combinations of table's attributes including "tenant_id".
> It is very inconvenient and inefficient.
>
> On the inconvenient part: doesn't postgres itself automatically create
> functional dependencies on combinations? i.e. it seems to me if we
> create statistics on (a, b, c), then we don't need to create
> statistics on (a, b) or (a, c) or (b, c), because the pg_statistic_ext
> entry for (a, b, c) already includes enough information.
>
> On the inefficient part, I think there's some areas of improvement
> here. For example, if (product_id) -> seller_id correlation is 1.0,
> then (product_id, product_name) -> seller_id correlation is definitely
> 1.0 and we don't need to store it. So we can reduce the amount of
> information stored in pg_statistic_ext -> stxdependencies, without
> losing any data points.
>
> More generally, if (a) -> b correlation is X, then (a, c) -> b
> correlation is >= X. Maybe we can have a threshold to reduce number of
> entries in pg_statistic_ext -> stxdependencies.
>
> -- Hadi

Yes, Postgres automatically creates functional dependencies on combinations.
But actually you do not need ALL combinations. Table can contain
hundreds of attributes: number of combination in this case will not fit
in bigint.
This is why Postgres doesn't allow to create muticolumn statistic for
more than 8 columns.
So if you have table with hundred attributes and tenant_id, you with
have to manually create statistic for each <tenant_id,att-i> pair.
And it is very inconvenient (and as I already mentioned doesn't
completely solve the problem with join selectivity estimation).

May be there are some other ways of addressing this problem (although I
do not them).
But I think that in any case, if number of distinction values is
explicitly specified for the attribute, then this value should be used
by optimizer instead of dummy DEFAULT_NUM_DISTINCT.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Eric Hanson 2019-03-29 08:45:04 Re: SET LOCAL ROLE NO RESET -- sandbox transactions
Previous Message Jamison, Kirk 2019-03-29 08:34:38 RE: Timeout parameters