Home » Blog
date 10.Mar.2023

■ Cheap and cheerful offline XML sitemap generator


I've got a confession to make, my website is online for 20 years and I never bothered to create a proper sitemap for it. Google and other search engines have their own spiders that automatically dig up new content without an XML sitemap (aka table of contents for the website). My if it ain't broke don't fix it strategy came to an abrupt close recently after I discovered that google does not index around 30% of my website content. New, well linked pages I added a month ago are still not known to google. I am not sure why this is the case, Bing and other search engines are quick to pick up new content without fuss.

Google say that sitemaps are not essential for small static websites — so why am I blacklisted?. Once you create a sitemap, you must keep it up to date reflecting changes in pages and content. A sitemap is a simple text XML document that lists all the URLs you care to index in your website, along with their last modification dates — plus other optional attributes. The essential information is in <loc> and <lastmod> tags such as:

<url>
 <loc>https://www.zabkat.com/deskrule/catalog-offline-search.htm<</loc>
 <lastmod>2023-01-02</lastmod>
</url>

There are many ways to create XML sitemaps, either you type it manually (feasible only for really small websites), or your CMS creates them for you (WordPress), or you can use one of the free (and paid) online generators, that crawl your website and discover all the cross-referenced HTML and PHP pages and create a content XML sitemap automatically. zabkat.com website is just above the limit of 500 pages set by free sitemap generators, and the only free desktop tool I am aware of (simple sitemap creator) is buggy and wouldn't produce a valid sitemap. So I spent one afternoon and wrote my own tool, which you can download and use for free.

Click to download offline sitemap generator tool (76 KB)
Unpack the ZIP archive then read the instructions

There is no installer, just unpack and run OFSITEMAP.EXE. It is portable, writing its settings in an INI file saved next to the program. I call it offline because it works from the inside. If you are a small website owner like myself, you probably have all the HTML files saved locally and use winscp or something similar to upload the HTML to the actual webserver. So you don't need to spider yourself, just read the local www folder (that is on your desktop PC) where you keep the HTML/PHP files that make up your website. The tool is very simple to use and looks like this:

simple sitemap builder tool

Let's say for argument's sake, that you keep your local website files under a local folder called C:\WWW\XPLORER2 and this corresponds to the website https://www.zabkat.com. The offline sitemap generator tool reads the local folder (and optionally subfolders) and gathers all HTML files it finds and creates XML entries for them. For example the local file C:\WWW\XPLORER2\index.htm is mapped to the URL https://www.zabkat.com/index.htm and so on.

The offline sitemap generator dialog box lets you key in this information:

You can include local subfolders if you tick the relevant box, but make sure the folder hierarchy matches your website (which is usually the case), e.g. the local subfolder XPLORER2\blog corresponds to https://www.zabkat.com/blog/

Your local WWW folders will have clutter that you don't want in your XML sitemap. First of all use the File extensions input box to specify what kind of web files you are after. If you need more than plain HTM, separate additional extensions using commas e.g. HTM,PHP. If there are HTM files you don't want in the XML, tick exclude noindex files and make sure you add this line in the HTML <HEAD> section:

<meta name="robots" content="noindex">

Then click on create XML button and the tool will create sitemap entries for all the matching files. At present it only creates the text, which you copy, then save it into a file called SITEMAP.XML and upload it to your website. Usually you need a ROBOTS.TXT file that points to the sitemap as such:

User-agent: *
Allow: /
Sitemap: https://www.yourwebsite.com/sitemap.xml

That's pretty much all there is to it. Every couple of months, if you add or modify website pages, you can run the free tool again to pick up the changes and re-upload the sitemap XML, and hopefully the search engine spiders will pick up your changes faster and the world can find your content.

As for my google indexing woes, I am not too optimistic that creating an XML sitemap is the answer, as google is aware of the 30% of the website pages that aren't indexed; it offers incomprehensible explanations for "Why pages aren't indexed" like Page with redirect and Soft 404, all of which won't make sense as my website is simple and fixed and does no redirects. I will keep you posted if it ever improves!

Post a comment on this topic »

Share |

©2002-2023 ZABKAT LTD, all rights reserved | Privacy policy | Sitemap