Sitemap Fields

Sitemap Fields

These fields control how documents appear in the XML sitemap generated by Makaira.

Target location: This content should be added to Application Functionalities > Sitemap in the documentation.

Sitemap URL

Access your sitemap at:

https://{customer}.makaira.io/{lang}/sitemap.xml?instance={instance}

Optional parameters:

  • ignoreAlternateLinks=true - Exclude alternate language links

Fields Used in Sitemap Output

FieldTypeSitemap ElementDescription
urlstring<loc>Document URL
canonical_urlstring<loc>Canonical URL (prioritized if set)
selfLinksobject<xhtml:link>Alternative language URLs
picture_url_mainstring<image:loc>Main product image
timestampdate<lastmod>Last modification date

Example Import

{
  "id": "product-123",
  "type": "product",
  "url": "/products/blue-t-shirt",
  "canonical_url": "/products/blue-t-shirt",
  "picture_url_main": "https://cdn.example.com/images/blue-t-shirt.jpg",
  "timestamp": "2024-01-15 10:30:00",
  "selfLinks": {
    "de": "/de/products/blaues-t-shirt",
    "en": "/en/products/blue-t-shirt",
    "fr": "/fr/products/t-shirt-bleu"
  }
}

Generated Sitemap Entry

<url>
  <loc>https://example.com/products/blue-t-shirt</loc>
  <lastmod>2024-01-15</lastmod>
  <xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/products/blaues-t-shirt"/>
  <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/products/blue-t-shirt"/>
  <xhtml:link rel="alternate" hreflang="fr" href="https://example.com/fr/products/t-shirt-bleu"/>
  <image:image>
    <image:loc>https://cdn.example.com/images/blue-t-shirt.jpg</image:loc>
  </image:image>
</url>

Fields That Control Sitemap Inclusion

Documents must meet all these criteria to appear in the sitemap:

FieldConditionEffect
urlMust existDocuments without URLs are excluded
activeMust be trueInactive documents are excluded
parentMust be empty or not existVariants are excluded (only parent products)
metadata.robotIndexMust NOT be "noindex"Allows hiding from sitemap
pageContent.metadata.robotIndexMust NOT be "noindex"For landing pages

Excluding Documents from Sitemap

To exclude a document from the sitemap while keeping it active:

{
  "id": "product-123",
  "active": true,
  "metadata": {
    "robotIndex": "noindex"
  }
}

Or for landing pages:

{
  "id": "page-123",
  "type": "page",
  "pageContent": {
    "metadata": {
      "robotIndex": "noindex"
    }
  }
}

Document Types Excluded from Sitemap

The following datatype values are automatically excluded:

DatatypeDescription
makaira-productVariants (only parent products appear)
linkSearchable links
searchredirectSearch redirects
menuMenu documents
menu_entryMenu entries
Landing page snippetsPages with type: "snippet"

URL Handling

Canonical URL Priority

If both url and canonical_url are set, canonical_url takes priority:

{
  "url": "/products/blue-t-shirt-summer-sale",
  "canonical_url": "/products/blue-t-shirt"
}

Sitemap will use: /products/blue-t-shirt

URL Deduplication

The sitemap automatically deduplicates URLs (first occurrence wins). If multiple documents have the same URL, only the first is included.

Special Characters

URLs are automatically XML-encoded:

  • &&amp;
  • <&lt;
  • >&gt;
  • "&quot;
  • '&apos;

Best Practices

  1. Always provide url - Required for sitemap inclusion.

  2. Use canonical_url for duplicates - When products have multiple URLs, set the canonical.

  3. Keep timestamp updated - Helps search engines know when to recrawl.

  4. Use selfLinks for multi-language - Enables proper hreflang tags.

  5. Set robotIndex: noindex for hidden pages - Exclude pages from sitemap without deactivating them.