Re: no mailing list hits in google

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: no mailing list hits in google
Date: 2019-08-30 09:40:16
Message-ID: 6591AA04-CBEA-47DA-8EF6-BFED464A44A8@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

> On 29 Aug 2019, at 16:55, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2019-08-29 09:32:35 -0400, Alvaro Herrera wrote:
>> On 2019-Aug-29, Magnus Hagander wrote:
>>
>>> Maybe Google used to load the pages under /list/ and crawl them for links
>>> but just not include the actual pages in the index or something
>>>
>>> I wonder if we can inject these into Google using a sitemap. I think that
>>> should work -- will need some investigation on exactly how to do it, as
>>> sitemaps also have individual restrictions on the number of urls per file,
>>> and we do have quite a few messages.
>>>
>>>> Why is that /list/ exclusion there in the first place?
>>>
>>> Because there are basically infinite number of pages in that space, due to
>>> the fact that you can pick an arbitrary point in time to view from.
>>
>> Maybe we can create a new page that's specifically to be used by
>> crawlers, that lists all emails, each only once. Say (unimaginatively)
>> /list_crawlers/2019-08/ containing links to all emails of all public
>> lists occurring during August 2019.
>
> Hm. Weren't there occasionally downranking rules for pages that were
> clearly aimed just at search engines?

I think that’s mainly been for pages which are clearly keyword spamming, I
doubt our content would get caught there. The sitemap, as proposed upthread,
is the solution to this however and is also the recommended way from Google for
sites with lots of content.

Google does however explicitly downrank duplicated/similar content, or content
which can be reached via multiple URLs and which doesn’t list a canonical URL
in the page. A single message and the whole-thread link does contain the same
content, and neither are canonical so we might be incurring penalties from
that. Also, the postgr.es/m/ shortener makes content available via two URLs,
without a canonical URL specified.

That being said, since we haven’t changed anything, and DuckDuckGo happily
index the mailinglist posts, this smells a lot more like a policy change than a
technical change if my experience with Google SEO is anything to go by. The
Webmaster Tools Search Console can quite often give insights as to why a page
is missing, that’s probably a better place to start then second guessing Google
SEO. AFAICR, using that requires proving that one owns the site/domain, but
doesn’t require adding any google trackers or similar things.

cheers ./daniel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-08-30 10:08:28 Re: no mailing list hits in google
Previous Message Kuntal Ghosh 2019-08-30 08:27:00 Re: POC: Cleaning up orphaned files using undo logs

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2019-08-30 10:08:28 Re: no mailing list hits in google
Previous Message Alvaro Herrera 2019-08-29 15:15:51 Re: Update PostgreSQL User Groups Listing