Re: no mailing list hits in google

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: no mailing list hits in google
Date: 2019-08-30 10:08:28
Message-ID: CABUevEwu4oy3Pg+j6SEFzOc7wn6aJi6hzN9RV-5v2-2VWK+CAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Fri, Aug 30, 2019 at 11:40 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:

> > On 29 Aug 2019, at 16:55, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2019-08-29 09:32:35 -0400, Alvaro Herrera wrote:
> >> On 2019-Aug-29, Magnus Hagander wrote:
> >>
> >>> Maybe Google used to load the pages under /list/ and crawl them for
> links
> >>> but just not include the actual pages in the index or something
> >>>
> >>> I wonder if we can inject these into Google using a sitemap. I think
> that
> >>> should work -- will need some investigation on exactly how to do it, as
> >>> sitemaps also have individual restrictions on the number of urls per
> file,
> >>> and we do have quite a few messages.
> >>>
> >>>> Why is that /list/ exclusion there in the first place?
> >>>
> >>> Because there are basically infinite number of pages in that space,
> due to
> >>> the fact that you can pick an arbitrary point in time to view from.
> >>
> >> Maybe we can create a new page that's specifically to be used by
> >> crawlers, that lists all emails, each only once. Say (unimaginatively)
> >> /list_crawlers/2019-08/ containing links to all emails of all public
> >> lists occurring during August 2019.
> >
> > Hm. Weren't there occasionally downranking rules for pages that were
> > clearly aimed just at search engines?
>
> I think that’s mainly been for pages which are clearly keyword spamming, I
> doubt our content would get caught there. The sitemap, as proposed
> upthread,
> is the solution to this however and is also the recommended way from
> Google for
> sites with lots of content.
>
> Google does however explicitly downrank duplicated/similar content, or
> content
> which can be reached via multiple URLs and which doesn’t list a canonical
> URL
> in the page. A single message and the whole-thread link does contain the
> same
> content, and neither are canonical so we might be incurring penalties from
> that. Also, the postgr.es/m/ shortener makes content available via two
> URLs,
> without a canonical URL specified.
>

But robots.txt blocks the whole-thread view (and this is the reason for it).
And postgr.es/m/ does not actually make the content available there, it
redirects.

So I don't think those should actually have an effect?

That being said, since we haven’t changed anything, and DuckDuckGo happily
> index the mailinglist posts, this smells a lot more like a policy change
> than a
> technical change if my experience with Google SEO is anything to go by.
> The
> Webmaster Tools Search Console can quite often give insights as to why a
> page
> is missing, that’s probably a better place to start then second guessing
> Google
> SEO. AFAICR, using that requires proving that one owns the site/domain,
> but
> doesn’t require adding any google trackers or similar things.
>

I've tried but failed to get any relevant data out of it. It does clearly
show large amounts of URLs blocked because they are in /flat/ or /raw/, but
nothing at all about the regular messages.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2019-08-30 10:34:43 Re: basebackup.c's sendFile() ignores read errors
Previous Message Daniel Gustafsson 2019-08-30 09:40:16 Re: no mailing list hits in google

Browse pgsql-www by date

  From Date Subject
Next Message Daniel Gustafsson 2019-08-30 11:09:14 Re: no mailing list hits in google
Previous Message Daniel Gustafsson 2019-08-30 09:40:16 Re: no mailing list hits in google