
This is the implementation guide for the Modulus sitemap.module, versions 1.0 and later. It is assumed that the reader is familiar with HTML and execution of CGI scripts.
sitemap.module is a utility to generate website site maps in XML, suitable for web site management and submission to Google and Yahoo. Both Google and Yahoo allow web-masters to submit site-maps; these site maps are then taken into consideration when the Google or Yahoo robot searches your website. Submission of site-maps may improve the quality of these searches, which might, in turn, improve your search engine positioning. Or not.
The module comprises a Perl file ("sitemap.pl") and an ini file (sitemap.ini). The Perl file should be installed in your CGI-BIN executable path; we suggest to install it
in its own directory, e.g.:
...your site.../cgi-bin/sitemap/sitemap.pl
Make sure to set permissions executable, e.g. 0755 for *NIX.
The Perl file takes a number of parameters to control its execution. These parameters can be supplied either as CGI form parameters, or by establishing them in an ini file.
The option of running sitemap.module with an ini file allows for running from the command line by using a command line parameter representing the ini file, for example:
perl sitemap.pl sitemap.ini
The parameters are:
sitemap.module_startDirectory: This is the main public directory of your server. On some systems this may be something like
'user/home/yoursite.com/web/public'. Maps will be created starting in this directory and recursively exploring all sub-directories.
sitemap.module_startURL: This is URL corresponding to your top-level directory. e.g. 'www.yoursite.com'.
sitemap.module_defaultChangeFrequency: This advises the Google or Yahoo robot of the frequency with which you expect to update your files.
Allowable values are 'always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'. This value will be applied to all files; you must then manually update the resulting sitemap
to reflect your actual expected change rate. An alternative is to use the value 'auto'; in this case sitemap.module will examine the date of each file to determine a reasonable
value based on when it was last updated.
sitemap.module_defaultPriority: This is the default relative priority which will be applied to each file; to set different relative priorities you
must manually edit the generated sitemap. Acceptable values are 0.0 to 1.0. A higher priority value (relative to another document) indicates that you give higher relative
priority to this document being indexed. Unless you have specific knowledge of how the Google or Yahoo robot will make use of this relative priority, we suggest leaving all values at
1.0.
sitemap.module_target: Target is either 'xml' or 'txt'. For 'xml', an XML map is generated, whereas for 'text' a text file is
generated.
sitemap.module_useExclude: Set to '1' or check the checkbox if you wish to exclude some files. When it is not checked, the search is
limited to HTML, SHTML, PHP, JSP and ASP files.
sitemap.module_excludeFiles: If useExclude is set, this is a comma-separated list of filenames or regular expressions for filenames which
are to be excluded from the map. At typical list might be *?\.dic, .*?\.inc, .*?\.dat, .*?\.css, .*?\.htaccess, .*?\.template, .*?\.lock, .*?\.js,
.*?\.txt, .*?\.jpe?g, .*?\.gif, .*?\.ico, .*?\.png to exclude dictionary, 'inc', data, cascading style sheet, Apache access control, template, lock, Javascript, text,
and image files.
Generate the sitemap either by:
You may then wish to set differential relative priorities (the documents will all have the default priority) and/or change the default change frequency of certain documents. You can then upload the site map to your web site (if you didn't create it there) and then, within the Google or Yahoo site management facility, submit the sitemap for their use by the Google or Yahoo robot.
As the Text standard is a simple text document, there is no adjustment of priority or change frequency involved. Upload the generated sitemap to your server's root directory, with the name 'urllist.txt'. At the time of writing, this exact name must be used. You can then use the Google or Yahoo site management facility to recognise the sitemap.
If you experience problems with the implementation, contact Modulus for assistance.