The sitemaps protocol allows informing search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site. To retrieve the sitemap you need your customer identifier, the language and the instance. then just call https://{customer}.makaira.io/{lang}/sitemap.xml?instance={instance}

Considered documents

There are multiple rules for documents that are considered to be listed in the sitemap.

  • the document must have an URL
  • the document must be active
  • the document type is not in
    • makaira-product (variants),
    • link (searchable links),
    • searchredirect (search redirect),
    • menu
    • menu_entry (menu)
  • metadata:{robotIndex: noindex} is not set (like this the elements can be hidden from the sitemap)

The XML data contains

  • URL: taken from URL or prioritized canonical_url if set
  • alternative language links (href+hreflang): taken from attribute selfLinks
  • images: taken from picture_url_main
  • last modified: taken from timestamp
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="
    http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
    http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd">    
    <url>
        <loc>https://www.makaira.io/de/kunden</loc>
        <xhtml:link rel="alternate" hreflang="de" href="https://www.makaira.io/de/kunden" />
        <xhtml:link rel="alternate" hreflang="en" href="https://www.makaira.io/en/customer" />
        <image:image>
            <image:loc>https://www.makaira.io/picture/kunden.jpg</image:loc>
        </image:image>
        <lastmod>2022-04-27</lastmod>
    </url>
    <url>
        ....        
    
</urlset>

📘

Duplicate URLs are ignored - each URL is output only once in the sitemap (automatic deduplication, first come first serve)