Quick Links

Re: Bad plan on a huge table query

From:	Daniel Cristian Cruz <danielcristian(at)gmail(dot)com>
To:
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Bad plan on a huge table query
Date:	2013-03-21 19:37:43
Message-ID:	CACffM9EOQuUuJf1JGT35ps_OQ-i_uu64CB6nSVB+sv6cGb70qA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Ooops, no query... Now it goes (Jeff, types in each line):

SELECT
aluno_mensal.id_matricula, --integer
aluno_mensal.id_turma, --integer
aluno_mensal.turma, --text
aluno_mensal.id_produto_educacao, --integer
aluno_mensal.produto_educacao, --text
aluno_mensal.unidade, --text
aluno_mensal.unidade_execucao, --text
aluno_mensal.modalidade, --text
aluno_mensal.id_pessoa, --integer
aluno_mensal.nome_pessoa, --text
presenca.id_diario, --integer
aula_confirmacao.inicio::date AS data_aula, --timestamp to date
presenca.justificativa_falta, --text
SUM(aula_confirmacao.termino - aula_confirmacao.inicio) AS carga_faltas,
--interval
mensal.ano AS ano_apuracao, --integer
mensal.mes AS mes_apuracao --integer
FROM indicadores.aluno_mensal
JOIN indicadores.mensal
ON mensal.id_mensal = aluno_mensal.id_mensal
JOIN turma.presenca
ON presenca.id_matricula = aluno_mensal.id_matricula
JOIN turma.aula_confirmacao
ON aula_confirmacao.id_evento = presenca.id_evento
JOIN turma.estudante_periodo
ON
estudante_periodo.id_matricula = presenca.id_matricula AND
estudante_periodo.id_diario = presenca.id_diario AND
aula_confirmacao.inicio::date BETWEEN estudante_periodo.inicio AND
estudante_periodo.termino -- timestamp, date, date
WHERE
presenca.inicio::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND
CURRENT_TIMESTAMP AND -- timestamp
NOT presenca.presente AND --boolean
mensal.ano = EXTRACT(YEAR FROM CURRENT_TIMESTAMP) AND -- integer
aula_confirmacao.inicio::DATE BETWEEN DATE_TRUNC('YEAR',
CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP AND --timestamp to date
aula_confirmacao.confirmada AND -- boolean
aluno_mensal.id_medicao = 7
GROUP BY
aluno_mensal.id_matricula,
aluno_mensal.id_turma,
aluno_mensal.turma,
aluno_mensal.id_produto_educacao,
aluno_mensal.produto_educacao,
aluno_mensal.unidade,
aluno_mensal.unidade_execucao,
aluno_mensal.modalidade,
aluno_mensal.id_pessoa,
aluno_mensal.nome_pessoa,
presenca.id_diario,
aula_confirmacao.inicio::date,
presenca.justificativa_falta,
mensal.ano,
mensal.mes;

2013/3/21 Daniel Cristian Cruz <danielcristian(at)gmail(dot)com>

> 2013/3/21 Alban Hertroys <haramrae(at)gmail(dot)com>
>
>> On Mar 20, 2013, at 22:36, Daniel Cristian Cruz <danielcristian(at)gmail(dot)com>
>> wrote:
>>
>> Hi,
>>
>> I'm trying to figure out why does the planner found 1 row estimate using
>> nested loops over a big table. There is no return from it:
>>
>> http://explain.depesz.com/s/GRs
>>
>>
>> That plan contains no actual statistics, which makes it difficult to say
>> anything about it. And you didn't provide any info on table definitions or
>> indexes whatsoever, we have to put that together from the generated query
>> plans. Not great...
>>
>
> My bad... I guess the plan could do it. And now I figured out that I lost
> the first query... Now the query looks like this:
>
>
>
>
>> It returns if disable nested loops, but the plan still poor:
>>
>> http://explain.depesz.com/s/fMY
>>
>>
>> You could probably gain some here by adding an index on
>> aluno_mensal.id_medicao. In step 14 the lack thereof causes a seqscan over
>> more than a million rows.
>>
>
> There is already an index on id_medicao. It used a hashjoin because I
> disable mergejoin which uses the index, instead there is no return.
>
>
>> What I also find a bit peculiar is that the filter in step 7 appears to
>> apply a function (date_part(text, date)) on every row in that heap. Do you
>> perhaps have a functional index on that table that makes that operation
>> efficient?
>>
>
> Yes, tried to improve performance creating a index on inicio using
> CAST(inicio TO DATE). The real filter here is aula_confirmacao.inicio::date
> BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
> (filtering all rows from year's beginning until now).
>
>
>> Besides, now() results in a timestamp, which in this query needs to be
>> converted to date; it's perhaps better to use CURRENT_DATE there, although
>> the benefits are probably immeasurable since it only needs to be calculated
>> once for all rows it's compared against.
>>
>
> DATE_TRUNC expect a text and a timestamp.
>
>
>>
>> I'm using PostgreSQL 9.2.3, default_statistics_target on 1000.
>>
>> I can't remember what to make PostgreSQL sees a better estimate in the
>> scan of aula_confirmacao and the join with presenca. I got rusty after a
>> long time just doing modeling.
>>
>> Does someone has some idea on that?
>>
>>
>> Are you asking about vacuum? You're definitely not vacuuming enough, your
>> statistics and your actual numbers of rows differ by enormous amounts
>> (that's why depesz colours them red).
>>
>
> autovacuum is running on production and the develop database. This is
> happening at develop database, fresh restore.
>
>
>> Are you using autovacuum? If so, you probably need to tune it more
>> aggressively. For the short term, running an ANALYSE on those tables should
>> at least get you more accurate query plans.
>>
>
> I've done it; with default_statistics_target on 1000, 100 and 200 (left it
> on 200, which was production config too).
>
> Thank you and sorry about the broken english, there was a long time since
> the last time I wrote...
>
> --
> Daniel Cristian Cruz
> クルズクリスチアンダニエル
>

--
Daniel Cristian Cruz
クルズクリスチアンダニエル

In response to

Re: Bad plan on a huge table query at 2013-03-21 19:30:43 from Daniel Cristian Cruz

Browse pgsql-general by date

	From	Date	Subject
Next Message	Rob Sargent	2013-03-21 21:23:46	Re: regexp_replace failing on 9.0.4
Previous Message	Daniel Cristian Cruz	2013-03-21 19:30:43	Re: Bad plan on a huge table query