Re: Very big insert/join performance problem (bacula)

From: SystemManagement <rdsys(at)fs(dot)wettzell(dot)de>
To: Marc Cousin <mcousin(at)sigma(dot)fr>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very big insert/join performance problem (bacula)
Date: 2009-07-13 14:37:06
Message-ID: 4A5B4692.5070909@fs.wettzell.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
just a remark, as the number of entries seems to be very high:
Did you ever activate bacula's program dbcheck Option 16?

Regards

Reiner

Marc Cousin schrieb:
> Hi,
>
> I'm trying to solve big performance issues with PostgreSQL + bacula while
> inserting very big sets of records.
>
> I'm sorry, this email will be a long one, as I've already spent quite a lot of
> time on the issue, I don't want to waste your time speculating on things I
> may already have done, and the problem is (or seems to me) a bit complex. The
> other problem is that I don't have the explain plans to provide with the
> email right now. I'll try to use this as a way to push 8.4 in this setup, to
> dump all these plans with autoexplain (queries are on temporary tables, so a
> bit tricky to get).
>
> Let me first explain or remind how this works. Bacula is a backup solution and
> is trying to insert its metadatas at the end of backups (file name, directory
> name, size, etc ...)
> For what we are interested in, there are 3 tables :
> - file
> - filename
> - path
>
> file is the one containing most records. It's the real metadata. filename and
> path just contain an id and the real file or directory name (to save some
> space with redundant names).
>
> Before explaining the issue, just some information about sizing here :
>
> file is 1.1 billion records for 280GB (with indexes).
>
> Column | Type | Modifiers
> ------------+---------+-------------------------------------------------------
> fileid | bigint | not null default nextval('file_fileid_seq'::regclass)
> fileindex | integer | not null default 0
> jobid | integer | not null
> pathid | integer | not null
> filenameid | integer | not null
> markid | integer | not null default 0
> lstat | text | not null
> md5 | text | not null
> Indexes:
> "file_pkey" UNIQUE, btree (fileid)
> "file_fp_idx" btree (filenameid, pathid)
> "file_jpfid_idx" btree (jobid, pathid, filenameid)
>
>
> path is 17 million for 6 GB
>
> Column | Type | Modifiers
> --------+---------+-------------------------------------------------------
> pathid | integer | not null default nextval('path_pathid_seq'::regclass)
> path | text | not null
> Indexes:
> "path_pkey" PRIMARY KEY, btree (pathid)
> "path_name_idx" UNIQUE, btree (path)
>
> filename is 80 million for 13GB
>
> Column | Type | Modifiers
> ------------+---------+---------------------------------------------------------------
> filenameid | integer | not null default
> nextval('filename_filenameid_seq'::regclass)
> name | text | not null
> Indexes:
> "filename_pkey" PRIMARY KEY, btree (filenameid)
> "filename_name_idx" UNIQUE, btree (name)
>
>
> There are several queries for each job despooling :
>
> First we fill a temp table with the raw data (filename, pathname, metadata),
> using COPY (no problem here)
>
> Then we insert missing filenames in file, and missing pathnames in path,
> with this query (and the same for file) :
>
> INSERT INTO Path (Path)
> SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a
> WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path)
>
> These do nested loops and work very well (after a sort on batch to get rid
> from duplicates). They work reasonably fast (what one would expect when
> looping on millions of records... they do their job in a few minutes).
>
> The problem occurs with the final query, which inserts data in file, joining
> the temp table to both file and filename
>
> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)
> SELECT batch.FileIndex,
> batch.JobId,
> Path.PathId,
> Filename.FilenameId,
> batch.LStat,
> batch.MD5
> FROM batch
> JOIN Path ON (batch.Path = Path.Path)
> JOIN Filename ON (batch.Name = Filename.Name)
>
> This one has two split personnalities, depending on how many records are in
> batch.
> For small batch tables, it does nested loops.
> For big batch tables (more than around one million initially) it decides to
> hash join path (still ok, it's reasonably small) and then filename to batch
> before starting. And that's when the problems begin The behaviour seems
> logicial to me, it should go to hash join when batch gets bigger, but it
> seems to be much too early here, considering the size of filename.
>
> First of all, performance remains much better on nested loops, except for
> extremely big batches (i'd say over 30 million, extrapolating from the times
> I'm seeing with 10 millions records), so if I disable hash/merge joins, I get
> my performance back on these queries (they execute in around the same time as
> the searches in path and filename above). So I found a way to make most of my
> queries do nested loops (I'll come back to this later)
>
> Second, If there is more than one of these big sorts, performance degrades
> drastically (we had 2 of them this weekend, they both took 24 hours to
> complete). This is probably due to our quite bad disk setup (we didn't have a
> big budget for this). There was no swapping of linux
>
>
> So all of this makes me think there is a cost evaluation problem in this
> setup : with the default values, postgresql seems to underestimate the cost
> of sorting here (the row estimates were good, no problem with that).
> PostgreSQL seems to think that at around 1 million records in file it should
> go with a hash join on filename and path, so we go on hashing the 17 million
> records of path, the 80 millions of filename, then joining and inserting into
> file (we're talking about sorting around 15 GB for each of these despools in
> parallel).
>
> Temporarily I moved the problem at a bit higher sizes of batch by changing
> random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
> apprentice sorcerer with this, as I told postgreSQL that fetching rows from
> disk are much cheaper than they are. These values are, I think, completely
> abnormal. Doing this, I got the change of plan at around 8 million. And had 2
> of them at 9 millions at the same time this weekend, and both of the took 24
> hours, while the nested loops before the join (for inserts in path and
> filename) did their work in minutes...
>
> So, finally, to my questions :
> - Is it normal that PostgreSQL is this off base on these queries (sorry I
> don't have the plans, if they are required I'll do my best to get some, but
> they really are the two obvious plans for this kind of query). What could
> make it choose the hash join for too small batch tables ?
> - Is changing the 2 costs the way to go ?
> - Is there a way to tell postgreSQL that it's more costly to sort than it
> thinks ? (instead of telling it that fetching data from disk doesn't cost
> anything).
>
>
>
>
> Here are the other non-default values from my configuration :
>
> shared_buffers = 2GB
> work_mem = 64MB
> maintenance_work_mem = 256MB
> max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> effective_cache_size = 800MB
> default_statistics_target = 1000
>
> PostgreSQL is 8.3.5 on Debian Lenny
>
>
>
> I'm sorry for this very long email, I tried to be as precise as I could, but
> don't hesitate to ask me more.
>
> Thanks for helping.
>
> Marc Cousin
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marc Cousin 2009-07-13 14:51:10 Re: Very big insert/join performance problem (bacula)
Previous Message Marc Cousin 2009-07-13 13:40:18 Very big insert/join performance problem (bacula)