As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label Googlebot. Show all posts

If you own a domain name and have searched using it as search keyword, you may have seen this kind of results in the screenshot on the right.

The last one on this page is totally legit, the first one is okay, but between them are websites which I categorize as garbages. This kind of websites is a variation of content farm. It's not like usual content generation, but using domain-related data to produce content to fill up the page, so it would look something in search bots' eyes. They will grab all sorts of results via API of other services and gives you some whois information.

That's really for noob to read, who don't know about where to look for information about a domain from the original sources, mainly for their own domain. I don't think one would want to read other's domain information, generally, at least not to read from this sort of trash websites.

Unfortunately,  Googlebot crawls and indexes this kind of websites. The time range in the screenshot about was set within 24 hours and the search hit 75 results (and it has increased to 80 while I am writing). Sadly, I haven't been able to use blocked unwanted sites, which is a feature of Google Search. That page has some JavaScript error, it's been broken for days since I've noticed. Don't know where to report a bug except using that community support forums, and I do not want to use that. Just another typical Google support method nowadays.

An interesting point is my domain is not even the focus of the matched results. These websites will put a list of domain names next to basically unrelated domain names, so they can somewhat increase search engine hit ratio. It's cheating, I would say, and Googlebot isn't that smart to know that.

Thanks for the notification!

Here is the email I just received:

Dear Webmaster,

Your site, https://<mydomain>/, uses an SSL certificate which is not recognized by web browsers. This will cause many web browsers to block users from accessing your site, or to display a security warning message when your site is accessed.

To correct this problem, please get a new SSL certificate from a Certificate Authority (CA) that is trusted by web browsers.

Thanks,

The Google Web Crawling Team

Where <MYDOMAIN> is yjl.im. At first glance, I thought this was new kind of phishing but it's real, the message was also on Webmaster Tools.

First of all, I believe I have never written down http://yjl.im/ anywhere, needless to mention the one with HTTPS. So, I guess Google is very kind to check that for you. If it isn't not this email, I haven't thought about to check it.

And here is a screenshot of the certificate:


The naked domain has URL forwarding to www.yjl.im. I use my registrar's free service, so I have no control of it. If Google App Engine could operate on naked domain, I wouldn't need that.

I don't understand why their servers listen to HTTPS, that makes no sense. Anyway, I might turn off URL forwarding or see what my registrar would say about it or just forget the whole thing...

I already noticed that Googlebot parses things look like links nine months ago. Once again, they still have many to do with their searching algorithm.

Since two days ago, I saw a strange request on my Google App Engine application, which resulted a 404:


At first, I have no idea how this came from, but I know this must be something about JavaScript script of that application. Later I checked the log and got:


I didn't notice any until I saw the IP: 66.249.85.129, that is quite familiar. It's from Google. That JavaScript script has a block like:


I think that's clear enough. However, I won't obscure the JavaScript script in order to get rid of this. Even though I will see 404 report, but I can know when Googlebot comes. Hit me! Google!