Re: postgres slower than grep?

From: Allan Engelhardt <allane(at)cybaea(dot)com>
To: Spiros Ioannou <sivann(at)image(dot)ece(dot)ntua(dot)gr>
Subject: Re: postgres slower than grep?
Date: 2001-07-09 22:05:34
Message-ID: 3B4A2AAE.87C480BF@cybaea.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I could only be botherd to try a million lines, but my results from 7.1.2 are below.

Basically:

1. I find about 50% database storage overhead in this case. That's not completely silly, considering this is structured data, but seems a little high. I don't know the internal structures well enough to really see what's happening.

2. Why would it be faster than grep? This has to match structured data, in this case varchar, and not just bytes. It has to worry about transactions and logs, not just a stream of data. Besides, in my tests it is not *that* slow (3 sec, compared with 1/2). Dunno what's up with your system.

3. As you said: With an index it rocks, easily beating grep. Use an index - it's your friend :-)

Allan.

[playpen]$ wc -l /tmp/input.txt
1000001 /tmp/input.txt
[playpen]$ ls -lh /tmp/input.txt
-rw-rw-r-- 1 allane allane 124M Jul 9 22:12 input.txt
[playpen]$ psql -f create_table.sql test
CREATE
[playpen]$ time psql -c "COPY clients FROM '/tmp/input.txt';" test

COPY

real 1m26.543s
user 0m0.000s
sys 0m0.010s
[playpen]$ du -sh $PGDATA/base/963549/
179M /var/lib/pgsql/data/base/963549
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
test
clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)

real 0m24.281s
user 0m0.010s
sys 0m0.000s
[playpen]$vacuumdb -a; vacuumdb -a --analyze
...
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)

real 0m3.808s
user 0m0.010s
sys 0m0.000s
[playpen]$ time grep '1234567' /tmp/input.txt
...
real 0m0.505s
user 0m0.150s
sys 0m0.360s
[playpen]$ dropdb test
DROP DATABASE
[playpen]$ createdb -E LATIN1 test
CREATE DATABASE
[playpen]$ psql -f create_table.sql test
CREATE
[playpen]$ time psql -c "COPY clients FROM '/tmp/input.txt';" test
COPY

real 1m23.927s
user 0m0.010s
sys 0m0.000s
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)

real 0m23.934s
user 0m0.010s
sys 0m0.010s
[playpen]$vacuumdb -a; vacuumdb -a --analyze
...
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)

real 0m3.796s
user 0m0.010s
sys 0m0.000s

[playpen]$ psql -c 'CREATE INDEX clients_idx_tel ON clients (tel);' test
CREATE
[playpen]$ vacuumdb --analyze test
VACUUM
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)

real 0m0.189s
user 0m0.010s
sys 0m0.000s

Spiros Ioannou wrote:

> Hello all,
>
> before you start reading, have in mind that this is not post to advertise
> one db over another, I just want to know what happens, that's why I did
> the comparisons with other db.
> to the point:
>
> This is my table:
>
> Table "table1"
> Attribute | Type | Modifier
> -----------+-------------+----------
> ClientID | integer |
> firstname | varchar(5) |
> surname | varchar(22) |
> area | varchar(3) |
> neigh | varchar(27) |
> dimos | varchar(50) |
> tel | varchar(7) |
>
> The Facts:
> ----------
> The table exported to a text file is about 330MB, and contains about 5
> milion lines.
>
> The same table takes 670MB in postgres. Why?
> (I tried mysql, and it uses 340MB).
>
> issuing the following query:
> select * from table1 where tel='7485842';
> takes about 1min and 40 seconds to complete. (query has one result)
>
> explain of the above returns:
> Seq Scan on table1 (cost=0.00..147835.01 rows=23 width=76)
>
> issuing a grep in the text file takes about 25 seconds! *(for the whole grep
> to finish - worse case)*
>
> issuing the same query in mysql takes about 25 seconds.
>
> -to test the disk speed, I doubled the size of the text file by copying
> it twice and the same grep query took 51 seconds (as expected)
>
> when creating an index the query completes of course in no time
>
> yes , i tried vacuum
>
> Postgres is version 7.0
> system is solaris 2.7
> hard disk is 10000rpm, ultraSCSI
> cpu is UltraSparcIIi 333Mhz
> physical memory is 384MB
>
> and now the questions
> ---------------------
> -shouldn't postgres be at least as fast as the grep?
> -shouldn't the table data file be smaller to reduce disk activity? Why is it
> double as the mysql same data file or the text file? I also noticed that
> an index file for column "tel" is about 130MB large and the mysql's one
> was 64MB, is this normal?
>
> Thank you for your time,
>
> Spiros Ioannou
> e-mail:sivann(at)kill-9(dot)gr
> ---------------------------------------
> Image Video & Multimedia Systems Lab.
> Department of Electrical & Computer Eng.
> National Technical University of Athens

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Allan Engelhardt 2001-07-09 22:26:04 Known problem with HASH index?
Previous Message Peter Eisentraut 2001-07-09 21:47:04 Re: Interfacing with MSAccess