Fixing “Your Sitemap appears to be an HTML page” in Google Search Console

In WordPress by Fathi ArfaouiLeave a Comment

Now that you’ve created a sitemap and submitted it to Google search console, you got an error message saying:

Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead.

From that message, you can see that your sitemap is just an HTML page in the eyes of Google crawlers. So, instead of finding an XML page, Googlebot find the HTML page that’s different from the supported format he wants.

Now let’s fix the problem and see how it works.

First, make sure to test your blog or website sitemap with this online tool. All you need is enter your exact sitemap URL and click the validate button on the bottom of that page. In general, you’ll see an invalid XML format. These are the solutions to fix the problem with Google Search Console.

Disable caching your XML sitemap

WordPress caching plugins create a cached copy of your XML sitemap. So, with that cache, they add optimizations and sometimes, they minify the XML and things like that. In other words, these tools can save wrong file cache because of the optimization.

To solve that problem, you need to exclude the sitemap page from the cache option, and then, clear the WordPress cache completely. So, if you’re using a popular plugin, you’ll find options like filters, or exclude caching pages in their settings. That work for popular plugins like W3 Total Cache, WP Super Cache, WP Rocket, HypeCache, and others.

In W3 Total Cache, for example, you can exclude the sitemap cache by clicking on Page Cache under Performance. Then, scroll until you see the option Never cache the following pages. Next, add your sitemap to the list. If you’re suing the Google XML sitemap plugin, add sitemap.xml and if you’re using the Yoast plugin for XML sitemap, add sitemap_index.xml and save the changes, then, clear the cache.

Never cache pages

Please note, if you’re using Sitemaps for videos, images or authors, add the with your main sitemap. You can also purge the CDN completely if you’re suing one.

You can also create a user-agent group for Googlebot, but that’s not guaranteed to work. So, I prefer disabling the XML cache to avoid errors and warning in your Google console account.

If you’re using the HyperCache WordPress plugin, you’ll find URL reject options.

For WP Rocket users, they can find the option under Advanced Settings in the WP Rocket menu in the WordPress dashboard. They, they can add their sitemap URL in the Never cache the following pages box.

WP Rocket

Use the default  style sheet option

By default, the Google XML sitemap generator plugin uses their stylesheet file to make the page looks good. But for many reasons, users can disable my mistake that options. So, the plugin will show a correct XML file, but without any formatting or styling option. That’s why, when you click your XML sitemap URL in the browser, you’ll see a page full of lines with a warning on the top like this message:

This XML file does not appear to have any style information associated with it

The XML will look like the following example:

Your Sitemap appears to be an HTML page

Now, to fix the problem, use the WordPress admin area menu, and click on XML-Sitemap under Settings. Next, scroll a little under Basic Options, and check the option “use default” as the following screenshot I’ve created.

Enable default XML page stylesheet

Finally, click the Update Options button and everything will work again. Clear your browser cache, and do the same for your WordPress cache, and test the XML again.

If you’re using another plugin, contact their support and ask for their stylesheet option if needed.

The above solution will work for the majority of WordPress users. But with many complications on server settings and plugins, the problem can be caused by other things. So, add your own solution in the comment section below and let others solve their problem. The XML sitemap is one of the best ways to let Google, and other search engines crawl your site better. Without a valid sitemap format, Googlebot can’t find all your posts, and pages and crawl them.

Fathi Arfaoui: A Physicist, Blogger, and the founder and owner of Trustiko.com. He shares Business, Blogging, WordPress, Web Safety, and Blogging tips to build better websites and blogs. Also, he shares online marketing strategies and recommendations.

Leave a Comment