Re: Varying results when using merge joins over postgres_fdw vs hash joins

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Varying results when using merge joins over postgres_fdw vs hash joins
Date: 2017-09-20 09:06:38
Message-ID: CAFjFpRevO0cocUgA47BV3CCkXdCPZkX5uCV8_BinsLkHWLUNww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 20, 2017 at 5:07 AM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
> We are having an issue with a query that will return no results when the
> query does a merge join with a foreign table, but (correctly) returns
> results when using a hash join.
>
> Here is the situation on the "remote" database (9.5):
>
>
> # \d+ table_with_en_us_utf8_encoding
> Table "public.table_with_en_us_utf8_encoding"
> Column | Type | Modifiers | Storage | Stats target |
> Description
> --------+------------------------+-----------+----------+--------------+-------------
> id | bigint | | plain | |
> str1 | character varying(255) | | extended | |
> str2 | character varying(255) | | extended | |
> str3 | character varying(255) | | extended | |
> str4 | character varying(3) | | extended | |
>
> analytics=# select encoding, datcollate, datctype from pg_database where
> datname = current_database();
> encoding | datcollate | datctype
> ----------+-------------+-------------
> 6 | en_US.UTF-8 | en_US.UTF-8
>
>
>
>
> And here's what we do on the local side (9.6):
>
> # select encoding, datcollate, datctype from pg_database where datname =
> current_database();
> encoding | datcollate | datctype
> ----------+------------+----------
> 6 | C | C
>
> # import foreign schema public limit to (table_with_en_us_utf8_encoding)
> from server primary_replica into public;
>
> # \d+ table_with_en_us_utf8_encoding
> Foreign table
> "public.table_with_en_us_utf8_encoding"
> Column | Type | Collation | Nullable | Default | FDW
> options | Storage | Stats target | Description
> --------+------------------------+-----------+----------+---------+----------------------+----------+--------------+-------------
> id | bigint | | | |
> (column_name 'id') | plain | |
> str1 | character varying(255) | | | |
> (column_name 'str1') | extended | |
> str2 | character varying(255) | | | |
> (column_name 'str2') | extended | |
> str3 | character varying(255) | | | |
> (column_name 'str3') | extended | |
> str4 | character varying(3) | | | |
> (column_name 'str4') | extended | |
> Server: primary_replica
> FDW options: (schema_name 'public', table_name
> 'table_with_en_us_utf8_encoding')
>

The collation column is empty here, which means that collation for
str* columns is default collation i.e. C. This isn't true, since the
default ncollation on the foreign server is different from the default
collation of local database. AFAIU, import foreign schema should have
set appropriate collation of the foreign table.

> # create temporary table tmp_on_c_collated_foreign_server (str2 text);
>
> # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972');
> # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972');
> # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972');
>
> --
> -- query with merge join, returns zero rows
> --
> # explain (analyze, verbose) select e.str1, e.str2, e.str3 from
> tmp_on_c_collated_foreign_server c left join table_with_en_us_utf8_encoding
> e on c.str2 = e.str2 where e.str4='2' ;
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=18041.88..22322.92 rows=229221 width=1548) (actual
> time=102.849..102.849 rows=0 loops=1)
> Output: e.str1, e.str2, e.str3
> Merge Cond: ((e.str2)::text = c.str2)
> -> Foreign Scan on public.table_with_en_us_utf8_encoding e
> (cost=17947.50..18705.95 rows=33709 width=93) (actual time=102.815..102.815
> rows=1 loops=1)
> Output: e.id, e.str1, e.str2, e.str3, e.str4
> Remote SQL: SELECT str1, str2, str3 FROM
> public.table_with_en_us_utf8_encoding WHERE ((str4 = '2'::text)) ORDER BY
> str2 ASC NULLS LAST
> -> Sort (cost=94.38..97.78 rows=1360 width=32) (actual
> time=0.028..0.029 rows=7 loops=1)
> Output: c.str2
> Sort Key: c.str2
> Sort Method: quicksort Memory: 25kB
> -> Seq Scan on pg_temp_3.tmp_on_c_collated_foreign_server c
> (cost=0.00..23.60 rows=1360 width=32) (actual time=0.010..0.011 rows=7
> loops=1)
> Output: c.str2
> Planning time: 4.285 ms
> Execution time: 104.458 ms
> (14 rows)
>

Since the results returned by the foreign server are according to the
collation of the foreign server, the order doesn't match with order
expected by the local server and so the merge join reports different
rows.

>
> --
> -- query with hash join, returns rows
>
> --
>
> -- the default for the foreign server is to use remote estimates, so we turn
> that off...
>
> # alter foreign table table_with_en_us_utf8_encoding OPTIONS (ADD
> use_remote_estimate 'false');
> ALTER FOREIGN TABLE
>
> -- and then run the same query again
>
> # explain (analyze, verbose) select e.str1, e.str2, e.str3 from
> tmp_on_c_collated_foreign_server c left join table_with_en_us_utf8_encoding
> e on c.str2 = e.str2 where e.str4='2' ;
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> Hash Join (cost=110.68..139.45 rows=7 width=1548) (actual
> time=154.280..154.286 rows=7 loops=1)
> Output: e.str1, e.str2, e.str3
> Hash Cond: (c.str2 = (e.str2)::text)
> -> Seq Scan on pg_temp_3.tmp_on_c_collated_foreign_server c
> (cost=0.00..23.60 rows=1360 width=32) (actual time=0.006..0.008 rows=7
> loops=1)
> Output: c.str2
> -> Hash (cost=110.67..110.67 rows=1 width=1548) (actual
> time=154.264..154.264 rows=33418 loops=1)
> Output: e.str1, e.str2, e.str3
> Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory
> Usage: 4003kB
> -> Foreign Scan on public.table_with_en_us_utf8_encoding e
> (cost=100.00..110.67 rows=1 width=1548) (actual time=8.289..144.210
> rows=33418 loops=1)
> Output: e.str1, e.str2, e.str3
> Remote SQL: SELECT str1, str2, str3 FROM
> public.table_with_en_us_utf8_encoding WHERE ((str4 = '2'::text))
> Planning time: 0.153 ms
> Execution time: 156.557 ms
> (13 rows)
>
>

In this case, both tables use same collation while comparing the rows,
so result is different from the merge join result. Hash join executed
on local server and the same executed on foreign server (by importing
local table to the foreign server) would also differ.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2017-09-20 09:32:46 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Thomas Munro 2017-09-20 08:55:43 Re: Error: dsa_area could not attach to a segment that has been freed