Quick Links

Re: identifying duplicates in table with redundancies

From:	"Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
To:	"Tarlika Elisabeth Schmitz" <postgresql2(at)numerixtechnology(dot)de>, <pgsql-sql(at)postgresql(dot)org>
Subject:	Re: identifying duplicates in table with redundancies
Date:	2010-09-29 13:00:26
Message-ID:	547760C5672F498ABD458B4B277DE370@marktestcr.marktest.pt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

>
> Hallo Andreas,
> I reduced the problem to the innermost query:
>
> 1) SELECT DISTINCT trainer_id, trainer_name FROM student
> This results in a sequential table scan. Execution time 7500ms.
>
> 2) I created an INDEX ON (trainer_id, trainer_name). Then I had an index
> scan instead, which still cost 7000ms.
>
> 3) Next, I changed from DISTINCT to GROUP BY:
> SELECT trainer_id, trainer_name FROM student
> GROUP BY trainer_id, trainer_name
> This resulted in an index scan @ 6750ms
>
> 4) I filtered out NULL trainer_ids
> WHERE trainer_id IS NOT NULL
> Amazingly, this resulted in a sequential table scan, which only took
> 1300ms!!
>
> Please, explain (pun not intended)! How can this be. Only 11000/250000
> rows have a null trainer_id.
>

That's an impressive improvement...
Personally I have no idea what caused it, specially when you say it was
sequential :-|
Warmed caches ?

Best,
Oliver

In response to

Re: identifying duplicates in table with redundancies at 2010-09-29 11:41:07 from Tarlika Elisabeth Schmitz

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Osvaldo Kussama	2010-09-29 13:30:07	Re: Regexp matching
Previous Message	Tarlika Elisabeth Schmitz	2010-09-29 11:41:07	Re: identifying duplicates in table with redundancies