Re: estimation problems for DISTINCT ON with FDW

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimation problems for DISTINCT ON with FDW
Date: 2020-06-30 22:21:32
Message-ID: 542866.1593555692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I poked into that and found that the problem is in estimate_num_groups,
> which effectively just disregards any relation that has rel->tuples = 0.
> That is the case for a postgres_fdw foreign table if use_remote_estimate
> is true, because postgres_fdw never bothers to set any other value.
> (On the other hand, if use_remote_estimate is false, it does fill in a
> pretty-bogus value, mainly so it can use set_baserel_size_estimates.
> See postgresGetForeignRelSize.)

> It seems like we could make estimate_num_groups a bit more robust here;
> it could just skip its attempts to clamp based on total size or
> restriction selectivity, but still include the reldistinct value for the
> rel into the total numdistinct. I wonder though if this is the only
> problem caused by failing to fill in any value for rel->tuples ...
> should we make postgres_fdw install some value for that?

Attached are a couple of quick-hack patches along each of those lines.
Either one resolves the crazy number-of-groups estimate for Jeff's
example; neither changes any existing regression test results.

On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
Sure, it makes that function "more robust", but it does so at the cost
of believing what might be a default or otherwise pretty insane
reldistinct estimate. We put in the clamping behavior for a reason,
and I'm not sure we should disable it just because reltuples = 0.

0002 seems like a better answer on the whole, but it has a pretty
significant issue as well: it's changing the API for FDW
GetForeignRelSize functions, because now we're expecting them to set
both rows and tuples to something sane, contrary to the existing docs.

What I'm sort of inclined to do is neither of these exactly, but
instead put the

baserel->tuples = Max(baserel->tuples, baserel->rows);

clamping behavior into the core code, immediately after the call to
GetForeignRelSize. This'd still let the FDW set baserel->tuples if
it has a mind to, while not requiring that; and it prevents the
situation where the rows and tuples estimates are inconsistent.

regards, tom lane

Attachment Content-Type Size
0001-fix-estimate_num_groups.patch text/x-diff 743 bytes
0002-fix-postgres_fdw.patch text/x-diff 837 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-06-30 22:35:55 Re: Remove Deprecated Exclusive Backup Mode
Previous Message Stephen Frost 2020-06-30 22:13:30 Re: Remove Deprecated Exclusive Backup Mode