Axiell WebAPI home page > Documentation > Site mapping functionality

Site mapping functionality

Site mapping functionality, a web service provided through the Axiell sitemapper.ashx alias is a standard part of the WebAPI from version 3.0.21210.1.

Introduction

Basically, a sitemap is a collection of XML files containing URLs (deep links) to all or selected pages of your website. By submitting a fixed URL to one, or a limited number of so-called sitemap index files, you allow web crawlers from internet search engines which support XML sitemap indexing (like Google, etc.), to index all your Axiell Internet Server website pages showing detailed presentations of database records. This way, every detailed view of a record can be found and opened through the internet search engine, opening up your collection to the entire internet. Moreover, with the sitemap service, detailed record view pages get a static URL which allows visitors to bookmark or e-mail them, or use them in numerous other ways. So why don’t these web crawlers know how to index your website without the Axiell sitemap service? Web crawlers can only find and index web pages if they know the URL; and they know of a URL if you either submitted it to them or if they find the URL (the hyperlink) on a web page they already know. When Axiell Internet Server websites retrieve their content from databases, the URLs to these records are put together only when a search is being executed by a user. The Internet Server knows how to do this, how to retrieve content in your particular case, but web crawlers don’t know this and therefore can’t index this content, hence the need for the sitemap protocol to make these URLs available to web crawlers. Sitemaps supplement the existing crawl-based mechanisms that search engines already use to discover URLs. Note that sitemaps are a URL inclusion protocol and complement robots.txt which is a URL exclusion protocol. Aside from the fixed URL(s) which you need to submit to the sitemap indexing service of the relevant search engine, Axiell creates all required files and URLs to records dynamically: this ensures that the indexed sitemap is always fully up-to-date. Even a persistent robots.txt file will be created or adjusted, to allow crawling of the sitemaps – this also allows website administrators to easily see which sitemap index URLs need to be submitted to search engines. Whenever a web crawler decides to index your website, it first calls the submitted URLs to Axiell’s sitemapper.ashx software, which results in relevant sitemap index files containing extensions of every submitted URL to request a sitemap file. The web crawler will then execute the sitemap requests in the index files. These requests cause sitemapper.ashx to search the entire database and dynamically create the sitemap files containing the URLs to the detailed views of all your records. The web crawler then accesses all these URLs, and indexes those website pages to make them searchable through the search engine on the web. The whole process should not take more than a few seconds, so the performance of your website will not suffer. To submit your sitemap index URL(s) to a search engine, you usually have to become a registered webmaster with that search engine: see the websites of the relevant search engines for information about creating a webmaster account and submitting sitemap URLs. A webmaster account with Google, for instance, also offers other advantages like being able to view crawl statistics (pages successfully indexed, pages blocked by robots.txt, pages that were unreachable, etc.) and to view the page rank distribution within your website.

Setting up your sitemap service

The Axiell sitemap service server handles the sitemap requests from a web crawler. Assuming you already have an Axiell Internet Server running on wwwopac.ashx in place, you will only have to set up the sitemap service. You can do this as a separate service (with a copy of all WebAPI files) or include it in the wwwopac.ashx setup. Create a mapper.xml file in the same folder that holds the adlibweb.xml file. This mapper.xml file will contain all relevant settings for the sitemap service. An example of a complete settings file can be seen below:

<?xml version="1.0" encoding="utf-8" ?>
<SiteMapParameters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <databaseLocationList>
    <DatabaseLocation>
      <RedirectUrl>https://ais.ourserver.com/details/collect/{0}</BaseUrl> 
      <Name>collect</Name> 
      <Location>\\ourserver\axiellapi\data\collect>intern</Location> 
      <StyleSheet>mapper.xslt</StyleSheet> 
    </DatabaseLocation>
    <DatabaseLocation>
      <RedirectUrl>https://ais.ourserver.com/details/book/{0}</BaseUrl> 
      <Name>document</Name> 
      <Location>\\ourserver\axiellapi\data\document>book</Location> 
      <StyleSheet>mapper.xslt</StyleSheet> 
    </DatabaseLocation>
  </databaseLocationList>
  <LogFile>mapper.txt</LogFile> 
</SiteMapParameters>

Explanation of the settings:

  • DatabaseLocation: contains the details of a database which you want to make accessible for sitemap requests. You can have as many DatabaseLocation nodes as needed.
  • >RedirectUrl: (or BaseUrl alternatively) must be the general deep link to a page on your Axiell Internet Server website showing the detailed presentation of a record, but without record number. The final sitemap will contain URLs composed of this base URL plus a record number. The RedirectUrl for AIS 5 should be in the format: http://<your_internet_server_domain>/details/<database_name>/{0}. The database_name is a database name as specified in the adlibweb.xml for the Axiell Internet Server. A priref should not be filled in here: sitemapper.ashx will add prirefs automatically.
  • >Name: will be the identifier for this DatabaseLocation specification. In the URLs which you submit to search engines, you’ll indicate the databases available for sitemap indexing using this name.
  • >Location: must contain a UNC path to the physical location of the .inf (database structure) file of this database. It will be used to extract the record number range(s) for the database and possibly its datasets.
  • >StyleSheet: is used to indicate an XSLT stylesheet with which output for a record request must be generated, each time a web crawler tries to access the data from a record, via deep links in the sitemap.
  • LogFile: automatically logs every sitemap request in a text file. You may choose a different name than the default mapper.txt.

Getting your website indexed

URLs to submit to a search engine

For every DatabaseLocation specified in mapper.xml, you need to submit a sitemap index request URL to the search engine which must index your site. Such a URL is composed of the base URL to the virtual directory which holds your Axiell Internet Server, followed by the path to sitemapper.ashx and the database name as specified in mapper.xml. For example:

http://api.axiell.com/internetserver5/SiteMapper/sitemapper.ashx/collect
Note that the URL must not end with a / character.

How a sitemap request is handled

Whenever the web crawler of a search engine decides to index your site, it will call the submitted URL(s). For each URL, sitemapper.ashx will generate a sitemap index file on-the-fly (which won’t be saved) containing sitemap requests for every dataset defined in the database. This is relevant because of the record number range specified for each dataset in the database. The lower record number of every range is returned in the sitemap requests and will serve as starting point for the indexing procedure. A returned sitemap index may look as follows:

<sitemapindex>
  <sitemap>
    <loc>
      http://api.axiell.com/internetserver5/SiteMapper/sitemapper.ashx/collect/sitemap/0
    </loc>
  </sitemap>
  <sitemap>
    <loc>
      http://api.axiell.com/internetserver5/SiteMapper/sitemapper.ashx/collect/sitemap/1000
    </loc>
  </sitemap>
  <sitemap>
    <loc>
      http://api.axiell.com/internetserver5/SiteMapper/sitemapper.ashx/collect/sitemap/11000
    </loc>
  </sitemap>
</sitemapindex>

These URLs instruct sitemapper.ashx to generate a sitemap (on-the-fly) starting with the indicated record number (at the end of the URL) in the relevant database. The resulting sitemap will contain deep links to all website pages showing a record from this database in detailed display, allowing the web crawler to find and index those pages. For example:

<urlset>
  <url>
    <loc>
      http://api.axiell.com/InternetServer5/details/collect/1
    </loc>
  </url>
  <url>
    <loc>
      http://api.axiell.com/InternetServer5/details/collect/2
    </loc>
  </url>
  <url>
    <loc>
      http://api.axiell.com/InternetServer5/details/collect/5
    </loc>
  </url>
  <url>
    <loc>
      http://api.axiell.com/InternetServer5/details/collect/6
    </loc>
  </url>
  …
</urlset>

Each deep link will lead to a full web page. The sitemap won’t be stored on your server.