Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.

From: James Inform <james(dot)inform(at)pharmapp(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.
Date: 2023-08-20 21:12:35
Message-ID: 48e87ec7-4371-3524-79d0-18439640ddc0@pharmapp.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you David for this detailled explanation.

Makes things very clear.

David Rowley schrieb am 20.08.23 um 23:04:
> On Mon, 21 Aug 2023 at 06:26, James Inform <james(dot)inform(at)pharmapp(dot)de> wrote:
>> Just looked at the results again.
>>
>> They look strange. When I execute the query multiple times I get three
>> kinds of results:
>> - NO match on all columns
>> - THE SAME match on all columns
>> - THREE DIFFERENT columns that are repeated for all the 1000 rows.
>>
>> With your explanation there should be randomly assigned rows appearing,
>> but the seems to be a pattern. At least more than those three different
>> ones.
>>
>> Have you executed my example and looked at the results?
>> If not, please give it a try. Nothing looks random there.
> You might expect that the "link_id = 1 + (random()*500)::int8" is
> evaluated as a join qual because you put it in the ON clause, but
> PostgreSQL will distribute these quals to the lowest location that
> they can be evaluated. Since the only column that's mentioned in your
> join expression belongs to the "link" table, then the qual is
> evaluated at the scan level for that relation. You'll notice this if
> you look at the EXPLAIN output.
>
> The reason you sometimes get no matches is simply that on that
> execution of the query, the random number didn't happen to line up
> with any of the particular link_ids on any of the scanned tuples.
>
> The reason you sometimes get the same match is that 1 tuple happened
> to match the random number during the scan, and that tuple was joined
> to 1000 times on your effective clauseless join.
>
> The reason you see three different columns being matches it is that 3
> tuples happened to match your random expression during the scan and
> the clauseless join joined all three, resulting in 3000 rows rather
> than 1000 rows in the final output.
>
> The link table is only scanned once due to the Material node in the
> Nested Loop join. If you did: SET enable_material = off; then the
> scan would be performed once per row in the "source" table. That
> would mean the random() function would be executed 1 million times
> instead of 1 thousand times.
>
> It might take you a while, but if you tried enough times, all the
> planets would align and "link_id = 1 + (random()*500)::int8" would
> happen to match all tuples during the scan. The query would then
> return 1 million rows.
>
> PostgreSQL wouldn't have pushed your ON qual down to the scan level if
> you'd included some column from the "source" table in the expression.
> I'm not sure what good it'd do you, but you'd see different results
> using something like "link_id = source_id * 0 + 1 +
> (random()*5)::int8" (note the multiplication by 0)
>
> David

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2023-08-20 23:32:39 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Andres Freund 2023-08-20 21:05:45 Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()