Queries runs slow on GPU with PG-Strom

From: YANG <stonetable(at)outlook(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Queries runs slow on GPU with PG-Strom
Date: 2015-07-22 15:16:08
Message-ID: BLU436-SMTP200807E5D5EABD07576C20C1830@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello,

I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was

with pg_strom
=============

explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=190993.70..190993.71 rows=1 width=0) (actual time=18792.236..18792.236 rows=1 loops=1)
-> Custom Scan (GpuPreAgg) (cost=7933.07..184161.18 rows=86 width=108) (actual time=4249.656..18792.074 rows=77 loops=1)
Bulkload: On (density: 100.00%)
Reduction: NoGroup
Device Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::double precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::double precision)
-> Custom Scan (BulkScan) on t0 (cost=6933.07..182660.32 rows=10000060 width=0) (actual time=139.399..18499.246 rows=10000000 loops=1)
Planning time: 0.262 ms
Execution time: 19268.650 ms
(8 rows)

explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=298541.48..298541.81 rows=26 width=12) (actual time=11311.568..11311.572 rows=26 loops=1)
Group Key: t0.cat
-> Custom Scan (GpuPreAgg) (cost=5178.82..250302.07 rows=1088 width=52) (actual time=3304.727..11310.021 rows=2307 loops=1)
Bulkload: On (density: 100.00%)
Reduction: Local + Global
-> Custom Scan (GpuJoin) (cost=4178.82..248541.18 rows=10000060 width=12) (actual time=923.417..2661.113 rows=10000000 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = aid), nrows_ratio: 1.00000000
-> Custom Scan (BulkScan) on t0 (cost=0.00..242858.60 rows=10000060 width=16) (actual time=6.980..871.431 rows=10000000 loops=1)
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4) (actual time=0.204..7.309 rows=40000 loops=1)
Planning time: 47.834 ms
Execution time: 11355.103 ms
(12 rows)

without pg_strom
================

test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=426193.03..426193.04 rows=1 width=0) (actual time=3880.379..3880.379 rows=1 loops=1)
-> Seq Scan on t0 (cost=0.00..417859.65 rows=3333353 width=0) (actual time=0.075..3859.200 rows=314063 loops=1)
Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::double precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::double precision)
Rows Removed by Filter: 9685937
Planning time: 0.411 ms
Execution time: 3880.445 ms
(6 rows)

t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=431593.73..431594.05 rows=26 width=12) (actual time=4960.810..4960.812 rows=26 loops=1)
Group Key: t0.cat
-> Hash Join (cost=1234.00..381593.43 rows=10000060 width=12) (actual time=20.859..3367.510 rows=10000000 loops=1)
Hash Cond: (t0.aid = t1.aid)
-> Seq Scan on t0 (cost=0.00..242858.60 rows=10000060 width=16) (actual time=0.021..895.908 rows=10000000 loops=1)
-> Hash (cost=734.00..734.00 rows=40000 width=4) (actual time=20.567..20.567 rows=40000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1919kB
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4) (actual time=0.017..11.013 rows=40000 loops=1)
Planning time: 0.567 ms
Execution time: 4961.029 ms
(10 rows)

Here is the details how I installed pg_strom,

1. download postgresql 9.5alpha1 and compile it with

,----
| ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert
| make -j8 all
| make install
`----

2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)

3. download and compile pg_strom with pg_config in /export/pg-9.5/bin

,----
| make
| make install
`----

4. create a db with --no-local

,----
| initdb --no-local 9.5
`----

5. change postgresql.conf

,----
| shared_buffers=1GB
| shared_preload_libraries='pg_strom.so'
| logging_collector = on
| log_filename='postgresql-%d.log'
| pg_strom.enabled=on
`----

6. start postgres

,----
| pg_ctl -D 9.5 start
`----

and got the following outputs

,----
| LOG: CUDA Runtime version: 7.0.0
| LOG: NVIDIA driver version: 346.59
| LOG: GPU0 Quadro K620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 2047MB (128bits, 900KHz), capability 5.0
| LOG: NVRTC - CUDA Runtime Compilation vertion 7.0
| LOG: redirecting log output to logging collector process
| HINT: Future log output will appear in directory "pg_log".
`----

7. import testdb

,----
| createdb test
| psql test < ~/devel/pg_strom/test/testdb.sql
| psql test -c 'create extension pg_strom'
`----

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-07-22 15:23:28 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Ildus Kurbangaliev 2015-07-22 14:50:35 Re: RFC: replace pg_stat_activity.waiting with something more descriptive