Exploring the Sample Thesaurus

The sample XML thesaurus that you defined as the seed for the ontolection-francis collection uses a generic XML format that makes it easy to associate alternate terms and phrases with a primary term or phrase. The syntax that is supported in this file makes it easy to identify the type of relationships that exist between those terms.

The following code listing shows the francis-thesaurus.xml file that you defined as the seed for the ontolection-francis collection in the previous section:

<?xml version="1.0" encoding="utf-8" ?>
<thesaurus name="automotive" language="english" domain="example">
  <word name="car">
    <synonym>auto</synonym>
    <synonym>truck</synonym>
    <synonym>van</synonym>
  </word>
  <word name="racing">
    <related>races</related>
    <narrower>speedway</narrower>
  </word>
  <word name="horse">
    <related>racing</related>
    <related>track</related>
  </word>
  <word name="detective">
    <synonym>investigator</synonym>
    <related>Tommy Flat</related>
    <related>Ty Roberts</related>
  </word>
</thesaurus>

The basic structure of this file is a <thesaurus> element that contains multiple <word> elements. Each <word> element's name attribute identifies the term with which some number of other terms are associated. Each associated term or phrase for a given <word> element is defined within an XML element inside the scope of that <word> element. The names of the elements that can appear within a <word> element identify the relationship between the term or phrase that the element contains and the name attribute of the parent <word> element.

When using this XML thesaurus format, some examples of the types of elements that can appear within a <word> element are the following:

Multiple elements of different types can appear within a single <word> element, as in the following example:

<word name ="car">
  <french>voiture</french>
  <german>Auto</german>
  <narrower>convertible</narrower>
  <narrower>dragster</narrower>
  <narrower>station wagon</narrower>
  <narrower>SUV</narrower>
  <related>mileage</related>
  <related>speed</related>
  <related>traffic</related>
  <spanish>carro</spanish>
  <spanish>coche</spanish>
  <spanish>automóvil</spanish>
  <synonym>auto</synonym>
  <synonym>automobile</synonym>
  <synonym>motorcar</synonym>
</word>

To finish configuring the ontolection-francis collection, you will need to configure some of these relations, as explained in the next section. To proceed to the next section, click Configuring XPaths for Semantic Relations.