From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | Hadi Moshayedi <hadi(at)moshayedi(dot)net> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Multitenancy optimization |
Date: | 2019-03-29 08:42:55 |
Message-ID: | 40646a68-fe57-0a3d-5c05-a2c13d2e676f@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 29.03.2019 11:06, Hadi Moshayedi wrote:
> On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
> Certainly it is possible to create multicolumn statistics to notify
> Postgres about columns correlation.
> But unfortunately it is not good and working solution.
>
> First of all we have to create multicolumn statistic for all possible
> combinations of table's attributes including "tenant_id".
> It is very inconvenient and inefficient.
>
> On the inconvenient part: doesn't postgres itself automatically create
> functional dependencies on combinations? i.e. it seems to me if we
> create statistics on (a, b, c), then we don't need to create
> statistics on (a, b) or (a, c) or (b, c), because the pg_statistic_ext
> entry for (a, b, c) already includes enough information.
>
> On the inefficient part, I think there's some areas of improvement
> here. For example, if (product_id) -> seller_id correlation is 1.0,
> then (product_id, product_name) -> seller_id correlation is definitely
> 1.0 and we don't need to store it. So we can reduce the amount of
> information stored in pg_statistic_ext -> stxdependencies, without
> losing any data points.
>
> More generally, if (a) -> b correlation is X, then (a, c) -> b
> correlation is >= X. Maybe we can have a threshold to reduce number of
> entries in pg_statistic_ext -> stxdependencies.
>
> -- Hadi
Yes, Postgres automatically creates functional dependencies on combinations.
But actually you do not need ALL combinations. Table can contain
hundreds of attributes: number of combination in this case will not fit
in bigint.
This is why Postgres doesn't allow to create muticolumn statistic for
more than 8 columns.
So if you have table with hundred attributes and tenant_id, you with
have to manually create statistic for each <tenant_id,att-i> pair.
And it is very inconvenient (and as I already mentioned doesn't
completely solve the problem with join selectivity estimation).
May be there are some other ways of addressing this problem (although I
do not them).
But I think that in any case, if number of distinction values is
explicitly specified for the attribute, then this value should be used
by optimizer instead of dummy DEFAULT_NUM_DISTINCT.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Eric Hanson | 2019-03-29 08:45:04 | Re: SET LOCAL ROLE NO RESET -- sandbox transactions |
Previous Message | Jamison, Kirk | 2019-03-29 08:34:38 | RE: Timeout parameters |