Improvement for query planner? (no, not about count(*) again ;-))

From: Tobias Völk <tobias(dot)voelk(at)t-online(dot)de>
To: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Improvement for query planner? (no, not about count(*) again ;-))
Date: 2020-07-19 19:22:03
Message-ID: 1jxEsu-0bGx160@fwd35.t-online.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general

Hello Postgres-Community,

I’ve got a table games(name1 text, name2 text) with 1.3x10^9 rows consisting of two two text columns for the names of players who’ve played a game, duplicate rows are possible, there’s no primary key since this table was just intended as a temporary storage for my data until further processing.

The length of a name is usually not more than 20 characters, shorter most of the time.
I’ve asked postgres to make an unlogged newtable(name text primary key) consisting of the unqiue names and executed:

Insert into newtable(name) select name1 from games on conflict do nothing;
(and later on intended to do the same for the second column)

However after hours it still wasn’t done, used only 1 cpu core to the max and read with 5 MB/s from my fast SSD.
So I stopped it.
I’ve also tried inserting (select name1 from games union select name2 from games) but it always wanted to do it using sorting.
But either the sorting or the preperations for the sorting were again only done using 1 core to the max and reading with 5 MB/s.

Couldn’t find a fast query for my problem.

So I wrote a java-program which read the whole table at a fetchsize of about 4 million and inserted the names into a HashSet.
And surprisingly after only a few minutes the program was already 25% done o.O

My question is, why isn’t postgres nearly this fast? Why doesn’t it just create a HashSet in RAM and read full speed from the disk?
I even created a hash index but it kept using it’s primary key b-tree and then I read that hash indices somehow don’t support checking for uniqueness.

Best regards, Tobi

--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2020-07-19 23:32:28 Re: psql has some accessibility issues on Windows
Previous Message Andy Fan 2020-07-19 02:25:55 Re: Reported type mismatch improperly

Browse pgsql-general by date

  From Date Subject
Next Message Thorsten Schöning 2020-07-19 19:51:12 Re: How to restore a dump containing CASTs into a database with a new user?
Previous Message David G. Johnston 2020-07-19 19:01:32 Re: Re: PG 9.5.5 cores on AIX 7.1