Adjusting SEO Using robots.txt

Operator of the LMS imc Learning Suite usually want to precisely control which content is found by search engines - and which is not. Customers often use multiple environments simultaneously for their LMS, such as production systems, test systems, and development environments. Without clear rules, search engines can inadvertently index content that was never intended to be publicly visible. This is exactly what the so-called robots.txt file does. It is a simple but effective tool in technical SEO.

What is robots.txt?

The robots.txt file is a small text file located in the root directory of a website, for example:

https://imc-learning.com/robots.txt

It provides search engine bots such as Googlebot or Bingbot with information about:

  • which areas may be searched

  • which areas should be excluded

  • which content should not appear in search results

The file is primarily intended for search engine crawlers and is used to control the so-called “crawling” process.

Why is robots.txt important?

The imc Learning Suite features many different content areas:

  • Public course pages

  • Login sections

  • User profiles

  • Exams and certificates

  • Trial courses

  • Development environments

  • Temporary training platforms

Not all of this content is suitable for search engines.

Without proper SEO management, for example:

  • internal testing systems can be found by search engines

  • duplicate content arises

  • sensitive data becomes visible

  • incorrect environments are indexed

  • Crawling budget is being wasted

The robots.txt file helps prevent exactly that.

What Does a Standard robots.txt File Look Like?

The standard version of the robots.txt file looks like this:

User-agent: *
Disallow: /
Sitemap: https://imc-learning.com/sitemap.xml

By default, the file is configured so that no environment of the imc Learning Suite can be found by search engines.

Directive

Meaning

User-agent

Target Crawler: defines the bot (e.g., *, Googlebot, Bingbot, GPTBot, etc.)

Disallow

Paths that may not be crawled

Allow

Paths that may be crawled, or exceptions within restricted paths

Sitemap

Links to the sitemap.

A sitemap is a structured file (usually in XML format) that lists all the important pages on a website and includes additional information, such as the last update date. It helps search engines like Google find and understand content efficiently.

Whether a sitemap entry exists depends on whether the customer maintains a sitemap.

Typical Environments for the imc Learning Suite

Productive Environment (should generally be found)

The live platform is normally publicly visible and may be indexed.

Example:

User-agent: *
Allow: /
Sitemap: https://imc-learning.com/sitemap.xml

Typical indexable content:

  • Course overviews

  • Landing pages

  • Training programs

Non-productive Environments (should generally not be found)

In addition to the productive environment, many customers use other non-productive environments, such as:

Generally, customers do not want these systems to be found by Google or other search engines.

Example:

User-agent: *
Disallow: /
Sitemap: https://imc-learning.com/sitemap.xml

This signals to search engines that they should not crawl anything in that environment.

The robots.txt file alone is not a security mechanism.

Sensitive test environments should be further protected by:

  • Password protection

  • VPN

  • IP restrictions

  • "noindex" meta tags

Common SEO Use Cases in the imc Learning Suite

Use Case 1: Exclude Login Areas

Login pages do not provide any SEO value and do not need to be indexed. This is ensured by configuring the robots.txt file as follows:

User-agent: *
Disallow: /ilp/pages/login.jsf

Use Case 2: Making Public Courses Visible

Public course pages may be indexed.

The public catalogs are accessible via the sitemap.

Use Case 3: Prevent Access to Backend Functionality

Access to the backend functions (ils) of the imc Learning Suite should not be publicly available. This is ensured by the following configuration of the robots.txt file:

User-agent: *
Disallow: /ils

What You Need to Do as a Customer

To start this process, please create a corresponding DESK ticket at the Scheer IMC Service Desk. It is the customer’s responsibility to adapt the robots.txt file in accordance with the relevant system environment and requirements. Once you have completed this, please provide the modified file(s) to Scheer IMC via the ticket. The Scheer IMC Hosting Team will then handle the deployment.

Recommendation: Check robots.txt regularly

Especially important:

  • before go-live

  • after relaunches

  • when changing domains

  • after LMS updates