From: | YANG <stonetable(at)outlook(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Queries runs slow on GPU with PG-Strom |
Date: | 2015-07-22 15:16:08 |
Message-ID: | BLU436-SMTP200807E5D5EABD07576C20C1830@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was
with pg_strom
=============
explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=190993.70..190993.71 rows=1 width=0) (actual time=18792.236..18792.236 rows=1 loops=1)
-> Custom Scan (GpuPreAgg) (cost=7933.07..184161.18 rows=86 width=108) (actual time=4249.656..18792.074 rows=77 loops=1)
Bulkload: On (density: 100.00%)
Reduction: NoGroup
Device Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::double precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::double precision)
-> Custom Scan (BulkScan) on t0 (cost=6933.07..182660.32 rows=10000060 width=0) (actual time=139.399..18499.246 rows=10000000 loops=1)
Planning time: 0.262 ms
Execution time: 19268.650 ms
(8 rows)
explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=298541.48..298541.81 rows=26 width=12) (actual time=11311.568..11311.572 rows=26 loops=1)
Group Key: t0.cat
-> Custom Scan (GpuPreAgg) (cost=5178.82..250302.07 rows=1088 width=52) (actual time=3304.727..11310.021 rows=2307 loops=1)
Bulkload: On (density: 100.00%)
Reduction: Local + Global
-> Custom Scan (GpuJoin) (cost=4178.82..248541.18 rows=10000060 width=12) (actual time=923.417..2661.113 rows=10000000 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = aid), nrows_ratio: 1.00000000
-> Custom Scan (BulkScan) on t0 (cost=0.00..242858.60 rows=10000060 width=16) (actual time=6.980..871.431 rows=10000000 loops=1)
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4) (actual time=0.204..7.309 rows=40000 loops=1)
Planning time: 47.834 ms
Execution time: 11355.103 ms
(12 rows)
without pg_strom
================
test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=426193.03..426193.04 rows=1 width=0) (actual time=3880.379..3880.379 rows=1 loops=1)
-> Seq Scan on t0 (cost=0.00..417859.65 rows=3333353 width=0) (actual time=0.075..3859.200 rows=314063 loops=1)
Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::double precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::double precision)
Rows Removed by Filter: 9685937
Planning time: 0.411 ms
Execution time: 3880.445 ms
(6 rows)
t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=431593.73..431594.05 rows=26 width=12) (actual time=4960.810..4960.812 rows=26 loops=1)
Group Key: t0.cat
-> Hash Join (cost=1234.00..381593.43 rows=10000060 width=12) (actual time=20.859..3367.510 rows=10000000 loops=1)
Hash Cond: (t0.aid = t1.aid)
-> Seq Scan on t0 (cost=0.00..242858.60 rows=10000060 width=16) (actual time=0.021..895.908 rows=10000000 loops=1)
-> Hash (cost=734.00..734.00 rows=40000 width=4) (actual time=20.567..20.567 rows=40000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1919kB
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4) (actual time=0.017..11.013 rows=40000 loops=1)
Planning time: 0.567 ms
Execution time: 4961.029 ms
(10 rows)
Here is the details how I installed pg_strom,
1. download postgresql 9.5alpha1 and compile it with
,----
| ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert
| make -j8 all
| make install
`----
2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)
3. download and compile pg_strom with pg_config in /export/pg-9.5/bin
,----
| make
| make install
`----
4. create a db with --no-local
,----
| initdb --no-local 9.5
`----
5. change postgresql.conf
,----
| shared_buffers=1GB
| shared_preload_libraries='pg_strom.so'
| logging_collector = on
| log_filename='postgresql-%d.log'
| pg_strom.enabled=on
`----
6. start postgres
,----
| pg_ctl -D 9.5 start
`----
and got the following outputs
,----
| LOG: CUDA Runtime version: 7.0.0
| LOG: NVIDIA driver version: 346.59
| LOG: GPU0 Quadro K620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 2047MB (128bits, 900KHz), capability 5.0
| LOG: NVRTC - CUDA Runtime Compilation vertion 7.0
| LOG: redirecting log output to logging collector process
| HINT: Future log output will appear in directory "pg_log".
`----
7. import testdb
,----
| createdb test
| psql test < ~/devel/pg_strom/test/testdb.sql
| psql test -c 'create extension pg_strom'
`----
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2015-07-22 15:23:28 | Re: [PROPOSAL] VACUUM Progress Checker. |
Previous Message | Ildus Kurbangaliev | 2015-07-22 14:50:35 | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |