Re: simple query on why a merge join plan got selected

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: simple query on why a merge join plan got selected
Date: 2018-12-15 22:01:13
Message-ID: d0f8fcfd-70f8-089f-6bb3-89daedda8a5b@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I'd run each query multiple times -- before and after disabling mergejoin --
to ensure that all the data is safely in RAM.

On 12/15/2018 02:13 PM, Vijaykumar Jain wrote:
> Hey Guys,
>
> I was just playing with exploring joins and plans i came across this
>
> create table t1(a int);
> create table t2(a int);
> insert into t1 select (x % 10) from generate_series(1, 100000) x;
> insert into t2 select (x % 100) from generate_series(1, 100000) x;
>
> pgtesting=> analyze t1;
> ANALYZE
>
> pgtesting=> analyze t2;
> ANALYZE
>
>
> this is reproducible
>
> the below query by default makes use of merge join (which takes way
> longer to return rows as compared to when i explicitly disable merge
> join it returns in half the time taken by merge join) but i am not
> able to figure out why, although i have run analyze on the tables.
>
>
> pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a);
>
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------
>
> Merge Join (cost=19495.64..1039705.09 rows=97241600 width=4) (actual
> time=124.153..22243.262 rows=100000000 loops=1)
>
> Merge Cond: (t1.a = t2.a)
>
> Buffers: shared hit=886, temp read=320384 written=616
>
> -> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual
> time=56.442..81.611 rows=100000 loops=1)
>
> Sort Key: t1.a
>
> Sort Method: external merge Disk: 1376kB
>
> Buffers: shared hit=443, temp read=172 written=173
>
> -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4)
> (actual time=0.030..10.003 rows=100000 loops=1)
>
> Buffers: shared hit=443
>
> -> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual
> time=67.702..9469.366 rows=100000001 loops=1)
>
> Sort Key: t2.a
>
> Sort Method: external sort Disk: 1768kB
>
> Buffers: shared hit=443, temp read=220222 written=443
>
> -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)
> (actual time=0.013..8.186 rows=100000 loops=1)
>
> Buffers: shared hit=443
>
> Planning time: 0.402 ms
>
> Execution time: 26093.192 ms
>
> (17 rows)
>
>
> pgtesting=> set enable_mergejoin TO FALSE;
>
> SET
>
> pgtesting=> explain (analyze, buffers) select * from t1 join t2 using (a);
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------
>
> Hash Join (cost=3084.00..1117491.00 rows=97241600 width=4) (actual
> time=26.893..10229.924 rows=100000000 loops=1)
>
> Hash Cond: (t1.a = t2.a)
>
> Buffers: shared hit=889, temp read=273 written=271
>
> -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4)
> (actual time=0.028..18.123 rows=100000 loops=1)
>
> Buffers: shared hit=443
>
> -> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual
> time=26.255..26.255 rows=100000 loops=1)
>
> Buckets: 131072 Batches: 2 Memory Usage: 2713kB
>
> Buffers: shared hit=443, temp written=152
>
> -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)
> (actual time=0.017..9.163 rows=100000 loops=1)
>
> Buffers: shared hit=443
>
> Planning time: 0.099 ms
>
> Execution time: 14095.975 ms
>
> (12 rows)
>
>
> pgtesting=> show work_mem;
>
> work_mem
>
> ----------
>
> 4MB
>
> (1 row)
>
>
> pgtesting=> show shared_buffers;
>
> shared_buffers
>
> ----------------
>
> 1GB
>
> (1 row)
>
>
> pgtesting=> select version();
>
>
> version
>
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609,
> 64-bit
>
> (1 row)
>
>
> May be i am missing something way obvious :) but my only concern being
> high cardinality joins may not use the statistics correctly?
>
> Regards,
> Vijay
>

--
Angular momentum makes the world go 'round.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ravi Krishna 2018-12-15 23:26:24 date_trunc not immutable
Previous Message Vijaykumar Jain 2018-12-15 20:13:56 simple query on why a merge join plan got selected

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-16 00:22:21 Re: simple query on why a merge join plan got selected
Previous Message Tom Lane 2018-12-15 21:20:11 Re: BUG #15548: Unaccent does not remove combining diacritical characters