Re: [v9.5] Custom Plan API

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-05-09 01:22:05
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F9F993@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
>
> I haven't really paid any attention to PGStrom. Perhaps it's just that I
> missed it, but I would find it useful if you could direct me towards a
> benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling compared
> to traditional strategies already implemented in Postgres and other
> systems.
>
Implementation of Hash-Join on GPU side is still under development.

Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.

EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.

postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=10003175757.67..10003175757.68 rows=1 width=0) (actual time=46648.635..46648.635 rows=1 loops=1)
-> Seq Scan on t1 (cost=10000000000.00..10003109091.00 rows=26666667 width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
Rows Removed by Filter: 77486186
Planning time: 0.066 ms
Total runtime: 46648.668 ms
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1274424.33..1274424.34 rows=1 width=0) (actual time=1784.729..1784.729 rows=1 loops=1)
-> Custom (GpuScan) on t1 (cost=10000.00..1207757.67 rows=26666667 width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)
Host References:
Device References: x, y
Device Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
Total time to load: 0.231 ms
Avg time in send-mq: 0.027 ms
Max time to build kernel: 1.064 ms
Avg time of DMA send: 3.050 ms
Total time of DMA send: 933.318 ms
Avg time of kernel exec: 5.117 ms
Total time of kernel exec: 1565.799 ms
Avg time of DMA recv: 0.086 ms
Total time of DMA recv: 26.289 ms
Avg time in recv-mq: 0.011 ms
Planning time: 0.094 ms
Total runtime: 1784.793 ms
(17 rows)

> If I wanted to make joins faster, personally, I would look at opportunities
> to optimize our existing hash joins to take better advantage of modern CPU
> characteristics. A lot of the research suggests that it may be useful to
> implement techniques that take better advantage of available memory
> bandwidth through techniques like prefetching and partitioning, perhaps
> even (counter-intuitively) at the expense of compute bandwidth. It's
> possible that it just needs to be explained to me, but, with respect,
> intuitively I have a hard time imagining that offloading joins to the GPU
> will help much in the general case. Every paper on joins from the last decade
> talks a lot about memory bandwidth and memory latency. Are you concerned
> with some specific case that I may have missed? In what scenario might a
> cost-based optimizer reasonably prefer a custom join node implemented by
> PgStrom, over any of the existing join node types? It's entirely possible
> that I simply missed relevant discussions here.
>
If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-05-09 01:40:19 Re: [v9.5] Custom Plan API
Previous Message Kouhei Kaigai 2014-05-09 01:18:37 Re: [v9.5] Custom Plan API