OAI (Open Archives Initiative) support
The OAI protocol, based on HTTP and XML, is a metadata harvesting protocol, intended to make the normally invisible contents of internet databases accessible and searchable by search engines through a submitted term.
The OAI functionality implemented through the oai.ashx alias (which is a standard part of the Axiell WebAPI from version 3.0.21154.1) supports OAI protocol 2.0. Through an OAI protocol request to a repository (a server on which e.g. the WebAPI has been configured for the OAI protocol), (meta)data can be extracted from your database, and be made available on the internet where it can be indexed by service providers such as search engines. A full description of the OAI protocol can be found on https://www.openarchives.org/
The difference between data and metadata in an Axiell Collections database is a matter of agreement. Normally, we regard the information in the database as data, but the description of an object or book, can just as easily be seen as metadata because the content of the book itself is not contained in the database. Collections databases store information about other data or objects: call it data or metadata.
The metadata resulting from an OAI query, may comply to a number of specified standards. The Dublin Core metadata standard is the standard that is provided with the Axiell implementation of OAI. Click here for the XML.-schema.
The standard consists of 15 elements (comparable to fields in a Collections application) in which the metadata is passed on. Since Collections databases contain many more fields than Dublin Core has elements, a limited amount of information is selected from a retrieved Collections record. But you can still define more than 15 elements. Dublin Core is really a narrow base that can be supported by anyone; this makes it easier to exchange information between differently structured data.
In principle, oai.ashx returns all fields from a Collections record (in AdlibXML format). But especially for OAI, oai.ashx uses an XSLT stylesheet (with an Collections-field-to-Dublin-Core-element mapping) set in the web configuration file to transform that search result to the proper metadata format (also in XML), before sending it to the harvester. With this, the metadata search result is transformed to a so-called OAI record (one OAI record per Collections record). Such a record primarily consists of a header and the selected metadata. The header consists of a unique identifier for the retrieved record, and of a date stamp that indicates when the record was last modified. The metadata is of course the retrieved data after transformation.
So the output format of an OAI search result is determined by an XSLT stylesheet on the server. Our standard Dublin Core stylesheet for this purpose is: oai_dc.xsl. Possible other output formats can be based on this stylesheet.
The use of OAI must be configured in the adlibweb.xml file that is also used by the WebAPI. By default, there is no OAI configuration in adlibweb.xml. Adjust the example below (of a complete OAI configuration) to your own situation and add it to adlibweb.xml:
<OAIConfiguration> <OAI_REPOSITORY_NAME>My Museum</OAI_REPOSITORY_NAME> <OAI_ADMIN_EMAIL>oai@mymuseum.uk</OAI_ADMIN_EMAIL> <OAI_ADMIN_EMAIL>admin@mymuseum.uk</OAI_ADMIN_EMAIL> <OAI-REPOSITORYIDENTIFIER>My Repository</OAI-REPOSITORYIDENTIFIER> <DM>dm</DM> <OAI_SETS> <OAI_SET> <SetSpec>collect</SetSpec> <Name>Default</Name> <Database>collect</Database> <SearchStatement>all</SearchStatement> <OAI_METADATAPREFIXES> <OAI_METADATAPREFIX> <Name>oai_dc</Name> <StyleSheet>oai_dc.xslt</StyleSheet> <!-- <Schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</Schema> <MetadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</MetadataNamespace>--> <OAI_METADATAPREFIX> <OAI_METADATAPREFIX> <Name>oai_adlib</Name> <!-- use oai_adlib.xslt as stylesheet if you want adlibXML as output --> <StyleSheet>oai_adlib.xslt</StyleSheet> <!-- <Schema>http://www.adlibsoft.com/adlibXML.xsd</Schema> <MetadataNamespace>http://www.adlibsoft.com/adlibXML</MetadataNamespace>--> <OAI_METADATAPREFIX> </OAI_METADATAPREFIXES> <OAI_SET> </OAI_SETS> </OAIConfiguration>
Explanation
- OAI_REPOSITORY_NAME: your repository name (choose a sensible one).
- OAI_ADMIN_EMAIL: administrator e-mail address.
- OAI-REPOSITORYIDENTIFIER: your own identifier for the repository (available from WebAPI version 3.7.1.3084). This will end up in the
<Identify>
section of a response from an OAI Identify request. - DM: specify the Collections field tag containing the modified date of the record, usually this is
dm
. In a lot of databases, in a date-of-modification field, the date on which a record was last edited, is stored. When you make your database available on the internet by means of the Open Archive Initiative, you can use the DM variable in the adlibweb.xml file to provide the tag of said field, so that search engines or other clients can only retrieve the records (or index them) that have been changed after a certain date, for instance after the previous visit to your OAI server. An OAI request with a date selection for a database in which no DM is specified, returns all records. - OAI_SETS: you may specify one or more OAI sets, each in its own
OAI_SET
, to address different OAI harvest requests. If noset
is called in the harvest request, the first one specified in here, in the adlibweb.xml file, will be used. - OAI_SET: specify the details for a single harvest request, including its allowed metadata prefixes.
- SetSpec: an identifying name without spaces for this OAI set (choose a sensible name), optionally to be used in harvest requests.
- Name: (underneath
OAISET
) a descriptive name for this OAI set, only returned withverb=listSets
- Database: the name of a
<databaseConfiguration>
elsewhere specified in the adlibweb.xml file, (so not the name of an .inf file). - SearchStatement: a WebAPI style search statement (using the same character escaping) which will be executed when the current set is addressed in the harvest request. See the WebAPI search command topic for a full description (including how to request the records from a saved search (aka pointer file).
- OAI_METADATAPREFIXES: specify the allowed metadata prefixes for this OAI set.
- OAI_METADATAPREFIX: the specification of a single metadata prefix.
- Name: (underneath
OAI_METADATAPREFIX
) set a name with which to address the current metadata prefix in a harvest call. - StyleSheet: specify the stylesheet to be used to format the result from the harvest request. Some ready-made stylesheets are present in the WebAPI folder containing the adlibweb.xml file as well.
- Schema: optionally, the URL to the .xsd specifying the intended XML schema of the harvest result, for validation purposes. The Axiell OAI server does nothing with this information though, except passing it to the client if requested with the
listMetaDataFormats
verb. - MetadataNamespace: optionally, the URI of the metadata namespace for the XML schema. As with the
<Schema>
element, the Axiell OAI-PMH implementation does nothing with this element, except for passing it to the client in thelistMetaDataFormats
response. To set the namespace of the<metadata>
<record>
nodes in the returned XML, use the default namespace (xmlns) in the XSLT like so:<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.adlibsoft.com/2002/adlibXML">
. Also, when in the XSLT<xsl:copy>
is used for copying complete elements, this will add an empty namespace attribute to the returned elements (<priref xmlns="">
), which may cause issues with harvesting software.
The resulting XML from an OAI search query normally doesn’t report on records missing from the result set if they were deleted in some way earlier. The current version of the WebAPI does allow this type of reporting though, so that it is clear to the harvester why those records are missing. A requirement is that journalling has been enabled for the relevant database (a setting in the .inf): this setting makes sure that record changes and deletions are registered in the database.
If journalling is enabled, then from that point on, deleted records will be registered in the database, but they won’t be reported in an OAI result set automatically. To enable the reporting of deleted records you must add a <reportDeletedRecords>true</reportDeletedRecords>
node to either the globalConfiguration
section or in each of the OAI_SET
sections in adlibweb.xml. The per set setting will override the global setting if the relevant set is addressed, so globally you can set it to false
whilst setting it to true
for a specific OAI_SET
. Use the global setting to set it for all OAI_SET
configs implicitly. (Note that an Identify
request only looks at the global setting and results in a <deletedRecord>No</deletedRecord>
node (journalling not switched on or <reportDeletedRecords>false</reportDeletedRecords>
is set in the adlibweb.xml) or <deletedRecord>Persistent</deletedRecord>
node (journalling is switched on and <reportDeletedRecords>true</reportDeletedRecords>
is set in the adlibweb.xml) in the Identify
result.)
If <reportDeletedRecords>false</reportDeletedRecords>
is missing from the configuration, the behaviour defaults to “false”, so deleted records won’t be reported then.
If journalling has been switched on and <reportDeletedRecords>true</reportDeletedRecords>
has been set, a GetRecord
and ListRecords
call will include each deleted record in the result set as follows (where collect:12
will be substituted by the relevant database name and record priref and the date by the actual date of deletion):
<record> <header status="deleted"> <identifier>collect:12</identifier> <datestamp>2021-06-10T07:56:04Z</datestamp> </header> </record>
This functionality was added in WebAPI version 3.0.21154.1.
When the keyword verb
is entered in the function call of oai.ashx, the server will process the search query as an OAI-request.
In principle, the search result of an OAI-request is yielded as an XML file, and so the client has to convert it to an HTML page first in order to show the result as a web page. Nevertheless, browsers can also present an XML file in code, which is sufficient for testing purposes. The syntax of a search query and the ways you can enter them are the same as for a normal search query for the WebAPI, be it that behind the question mark in the CGI string a number of special variables has to be entered due to the OAI protocol. To make for instance an Identify-request, use the following syntax:
…oai.ashx?verb=Identify
The Identify
value provides information about the relevant repository which has been made available through the OAI protocol, such as the repository name (e.g. the name of your collection), the base URL to address OAI search queries to, the protocol version that is supported, the e-mail address of the repository system administrator, and any additional information.
A standard OAI protocol request has at least one name/value pair to specify the request by the client. The above Identify request is an example of this, but instead of Identify
, each of the standard OAI protocol requests can be used. The number and the nature of extra name/value pairs depend on the arguments for the specific protocol request, e.g.:
…oai.ashx?verb=GetRecord&identifier=3&metadataPrefix=oai_dc
for a GetRecord call of the record with local identifier (Collections record number/priref) 3.
The metadataPrefix oai_dc
specifies the use of the Dublin Core <OAI_METADATAPREFIX>
output format, as specified in the relevant OAI_SET
.
Also when you submit a ListIdentifiers
or ListRecords
request (see further in this topic), then with the metadataPrefix
parameter you specify (indirectly) which stylesheet should be used.
Some ready-made stylesheets are present in your main WebAPI folder, including oai_dc.xslt for the XML transformation to Dublin Core. On the basis of this stylesheet you could make you own stylesheet with another name, if you desire. Save it in the same folder.
A client can submit various requests to a repository. Every request is the value of a verb, namely: verb=<request>
and follows directly behind the question mark in a search query.
The most important requests are summed up underneath. Detailed information about their syntax and the responses they trigger can be found here. (Requests are case-sensitive.)
- GetRecord: This verb is used to extract an individual metadata record from a repository. Mandatory arguments specify the identifier of the requested record, and the output format of the metadata (e.g.
oai_dc
). - Identify:
Identify
gathers information about the repository. No arguments. - ListIdentifiers: This verb is used to gather the local identifiers (prirefs) of all records that are available in the repository. Optional arguments enable selective gathering, for instance based on the date they were last modified.
- ListRecords: This verb is used to gather all* the records available in the repository. Optional arguments enable selective gathering, for instance based on the date they were last modified.
* 501 records are maximally retrieved per request. Subsequent partial lists of the same search result may be retrieved via so-called resumptiontokens (see further down). - ListSets: lists the
setSpec
andsetName
of the OAI set(s) specified in the OAI configuration of the repository. No arguments.
When the harvest search result is a very long list, only a limited portion of it will be returned by default (501 records) and accompanied by a resumptionToken
(at the bottom of the XML result). Such a resumption token is necessary to open the next part of the list, as an argument in a subsequent OAI request.
Note that if in a request you use a resumptionToken
, you must not provide a metadataPrefix
, nor any other argument: the last used arguments will automatically be applied again.
Suppose you’ve entered a first ListRecords
request, which resulted in a list with for instance 10 records, and a resumption token “97EODx6A3phYxIY”:
../oai.ashx?verb=ListRecords&metadataPrefix=oai_dc
The next 10 records of the search result are now retrieved with:
../oai.ashx?verb=ListRecords&resumptionToken=97EODx6A3phYxIY
In the new list you’ll find another resumptionToken
if more records are available. A token expires after 24 hours.
Use an online OAI validator like OAI-PMH validator to test and validate your oai.ashx server.
Sometimes you'd like to exclude specific records from an OAI harvest result, based on a value in some field in those records. You could have some custom checkbox to indicate so, for example. To this end, a new OAI configuration element has been introduced in WebAPI 3.1.1.1286: <HiddenRecordsFilter>
. Provide an advanced search query in such a node underneath the desired <OAI_SET>
configuration. For example: <HiddenRecordsFilter>exclude_from_web = 'x'</HiddenRecordsFilter>
. This assumes you have a checkbox field named exclude_from_web but you can use any field and any value. When marked, it'll contain the value 'x', so this condition then excludes that record from an OAI harvest. The harvest result XML will still list the record identifier and the date stamp, but their header node will get the status="deleted"
attribute (even though the record hasn't really been deleted) to effectively hide/exclude the record from the result. A partial XML result where a record 2 is excluded this way, could look as follows for example:
<?xml version="1.0" encoding="utf-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2022-03-24T07:09:18Z</responseDate> <request verb="ListRecords" set="collect" metadataPrefix="adlib">http://localhost:12345/oai.ashx</request> <ListRecords> <record> <header status="deleted"> <identifier>collecthighlight:2</identifier> <datestamp>2022-03-24T07:09:18Z</datestamp> </header> </record> <record> <header> <identifier>collecthighlight:3</identifier> <datestamp>2022-02-08T14:44:38Z</datestamp> <setSpec>collect</setSpec> </header> <metadata> <record priref="3" created="2020-07-06T09:20:57Z" modification="2022-02-08T14:44:38Z" selected="false" deleted="false" xmlns="http://www.openarchives.org/OAI/2.0/"> ...
With the oai.ashx server you open up a database for OAI searches from the internet. However, it may be that not all records in that database should be publicly accessible. Luckily it’s possible to exclude specific records from OAI search results.
We typically use a separate Active Directory account named cmuiisuser
to indicate anonymous internet users and this account name must then also be registered as a user in the Axiell Collections application which must use the record authorisation mechanism to exclude certain users. This user name must then be entered per record to be excluded from an OAI search result, prior to any harvesting.
See the Use the authorisation functionality paragraph in the User authentication and access rights topic in the Axiell Designer Help for an explanation about setting up this type of access restriction.
From WebAPI version 3.0.21252.1, oai.ashx produces JSON (implicitly of the more compact type) if the new setting: <output>json</output>
has been configured in the <globalConfiguration>
section in the adlibweb.xml configuration file. If you do not configure this option, the OAI result will be grouped XML. The output=xml
or output=json
query arguments (as supported for wwwopac.ashx) cannot be used for oai.ashx. This OAI functionality is not hindered or influenced by the XML type setting for wwwopac.ashx and vice versa the JSON format or XML type settings for wwwopac.ashx are not hindered by the JSON option for OAI, but if no <jsonFormat>
has been specified for wwwopac.ashx then the <output>json</output>
setting will cause wwwopac.ashx and oai.ashx (the latter from version 3.0.21273.1) to spit out JSON of the jsonv1 type unless (for wwwopac.ashx) the URL has an output=xml
argument: a jsonFormat=standard
argument won't work in that case.
For more information about the referenced wwwopac.ashx settings, see the JSON output topic.
At first, OAI jsonv1 type output was missing data about deleted records and resumption tokens, but this was fixed in 3.0.21277.1. For enumerative fields a difference remains between the pre-3.0.21277 version of jsonv1 and the new version of jsonv1: spans are missing for this field type, element “lang” is now “@lang” and element “text” is now “#text”: this will not be fixed.