Re: Asynchronous Append on postgres_fdw nodes.

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Date: 2021-05-11 07:24:53
Message-ID: CAPmGK15CyEmstJyzkGsT_1OmArJwD6vQguBEwiz8oYx4zz6ssA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 11, 2021 at 11:58 AM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Your patch fixes the problem. But I found two more problems:
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT * FROM (
> (SELECT * FROM f1)
> UNION ALL
> (SELECT * FROM f2)
> UNION ALL
> (SELECT * FROM l3)
> ) q1 LIMIT 6709;
> QUERY PLAN
> --------------------------------------------------------------
> Limit (actual rows=6709 loops=1)
> -> Append (actual rows=6709 loops=1)
> -> Async Foreign Scan on f1 (actual rows=1 loops=1)
> -> Async Foreign Scan on f2 (actual rows=1 loops=1)
> -> Seq Scan on l3 (actual rows=6708 loops=1)
>
> Here we scan 6710 tuples at low level but appended only 6709. Where did
> we lose one tuple?

The extra tuple, which is from f1 or f2, would have been kept in the
Append node's as_asyncresults, not returned from the Append node to
the Limit node. The async Foreign Scan nodes would fetch tuples
before the Append node ask the tuples, so the fetched tuples may or
may not be used.

> 2.
> SELECT * FROM (
> (SELECT * FROM f1)
> UNION ALL
> (SELECT * FROM f2)
> UNION ALL
> (SELECT * FROM f3 WHERE a > 0)
> ) q1 LIMIT 3000;
> QUERY PLAN
> --------------------------------------------------------------
> Limit (actual rows=3000 loops=1)
> -> Append (actual rows=3000 loops=1)
> -> Async Foreign Scan on f1 (actual rows=0 loops=1)
> -> Async Foreign Scan on f2 (actual rows=0 loops=1)
> -> Foreign Scan on f3 (actual rows=3000 loops=1)
>
> Here we give preference to the synchronous scan. Why?

This would be expected behavior, and the reason is avoid performance
degradation; you might think it would be better to execute the async
Foreign Scan nodes more aggressively, but it would require
waiting/polling for file descriptor events many times, which is
expensive and might cause performance degradation. I think there is
room for improvement, though.

Thanks!

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-11 07:29:28 Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Previous Message Andrey Lepikhov 2021-05-11 07:06:05 Re: Defer selection of asynchronous subplans until the executor initialization stage