SOLR Index REST API

Introduction

This article provides a basic description of how a REST API works and its mechanisms, followed by an overview of the implementations available in the LMS for querying report data and the necessary help and authentication interfaces, including examples. Report data can be queried via API for all on-screen reports, but not for standard reports or reports created using Report Designer.

The following chapters provide a basic introduction to the architecture of REST APIs. If you are already familiar with these topics, we recommend that you skip directly to the section REST API of the LMS to start with the information that is relevant to you.

About REST API

A Representational State Transfer (REST) API is an architectural style for designing
network applications. It is based on established World Wide Web standards and uses the HTTP protocol for communication. The core idea behind REST is interaction with resources, whereby each resource can be identified by a unique URI (Uniform Resource Identifier). Communication between client and server is stateless, which means that each request contains all the necessary information and the server does not need to store any client context between requests. JSON (JavaScript Object Notation) is predominantly used as the data format for exchange, as it is lightweight and easy to read for both humans and machines. The consistent use of standard HTTP methods such as
GET, POST, PUT and DELETE makes REST APIs intuitive, flexible and scalable, making them the preferred choice for modern web services.

Get

The GET method is one of the most basic and frequently used HTTP request methods within the framework of a REST API. Their primary purpose is to retrieve data from a specified resource. By definition, a GET request is safe and idempotent, which means that it must not change the state of the resource on the server and that multiple identical requests lead to the same result as a single request. Parameters for filtering, sorting or paginating the retrieved data are usually appended to the URI as query parameters. If the request is successful, the server typically responds with the HTTP status code 200 OK and delivers the requested data in the body of the response, usually in JSON format.

Authorization Header

The Authorization Header is a standardised HTTP header used to authenticate a client to a server. It enables the client to prove its identity or authorisation in order to access protected resources. The header contains the login information (credentials) in an extensible schema. The structure consists of the type of authentication method (e.g. Basic, Bearer) followed by the actual credentials. The correct implementation of the Authorization header is crucial for the security of an API, as it ensures that only authorised parties can access sensitive data and functionalities.

Basic Authentication

Basic Authentication is a simple authentication scheme defined in the HTTP standard. In this method, the client sends the username and password in a single Base64-encoded string in the Authorization Header. The format is Authorization: Basic <base64-credentials>, where <base64-credentials> is the Base64 encoding of username:password. Although this scheme is widely used and easy to implement, it does not offer a high level of security, as Base64 can be easily decoded. For this reason, Basic Authentication may only be used in conjunction with an encrypted connection such as HTTPS/TLS to prevent third parties from intercepting login information.

Bearer Authentication

Bearer Authentication (also known as Token Authentication) is a modern and secure authentication scheme that is often used in conjunction with OAuth2 and OIDC. Instead of sending a username and password with every request, the client authenticates itself once with an authorisation server and receives an access token (Bearer Token) in return. This token is then sent with every request to the protected API in the Authorisation Header with the Bearer scheme. The name ‘Bearer’ implies that anyone in possession of the token will be granted access. It is therefore essential to handle these tokens securely and only transfer them via encrypted connections.

Bearer Token

A Bearer Token is a cryptographic string that serves as proof of authorisation in Bearer Authentication. This token is issued by an authorisation server after a client has successfully authenticated itself. It represents the authorisation to access certain resources on behalf of a user. Bearer tokens are usually short-lived in order to minimise the security risk in the event of theft. These are often JSON Web Tokens (JWT), which, in addition to access authorisation, may also contain information (claims) about the user and the validity period. The client must store the token for the duration of its validity and send it in the Authorisation Header with every request to a protected endpoint.

Path Parameter

Path Parameters are variable components of the URL path and are used to uniquely identify a specific resource within a collection. They are embedded directly into the URI structure of the endpoint, typically marked by curly brackets, such as /users/{userId}. To access a specific resource, the client replaces this placeholder with an actual value, for example /users/123. Path Parameters are usually mandatory, as they are an essential part of determining the requested resource. They are ideal for accessing a single, unique data element.

Query Parameter

Query Parameters are key-value pairs that are appended to the end of a URL to modify the amount of data returned by an endpoint. They are separated from the path by a question mark (?) and separated from each other by an ampersand (&), e.g. /articles?status=published&sort=desc. Unlike Path Parameters, which identify a resource, Query Parameters are typically used to filter, sort or paginate a collection of resources. They are usually optional and offer the client a flexible way to tailor the server's response to its specific requirements without having to define a new endpoint.

POST

The POST method is a central HTTP request method that is mainly used to send data to a server in order to create a new resource. Unlike the GET method, which passes parameters in the URL, a POST request transports the data to be transmitted (the so-called Payload) in the body of the request. An essential attribute of POST is that the method is neither secure nor idempotent. This means that it changes the state on the server, and executing the same POST request multiple times usually results in the creation of multiple new resources. If creation is successful, the server typically responds with HTTP status code 201 Created. In addition, the response often contains a Location Header that specifies the URI of the newly created resource and can return a representation of the new object in the body.

url-form-encoded

application/x-www-form-urlencoded is a media type (MIME type) used to encode data sent from a client to a server, typically in the body of a POST request. This format mimics the way an HTML form transmits its data. The data is structured as a string of key-value pairs, with each pair separated by an equal sign (=) and the individual pairs connected by an ampersand (&). Special characters within keys and values are percent-encoded to ensure correct transmission and interpretation. To use this data type, the client must set the Content-Type header of the HTTP request to application/x-www-form-urlencoded. Although JSON (application/json) is often preferred for transferring complex data in modern REST APIs, url-form-encoded remains a common standard, especially for simple form submissions and in OAuth2 flows, such as when requesting tokens.

REST API of the LMS

The following sections describe the endpoints for querying report data. They also explain the procedure for correct authentication, filtering, sorting, pagination and language selection. The placeholder <baseUrl> is often used in the explanations. The URL for the LMS must be specified up to the top-level domain (.com/.de) in this placeholder.

A Postman collection with practical sample queries is available for download for all interfaces mentioned in the following sections:
SOLR API Examples.postman_collection.json

Authentication

One way to authenticate at such an interface is Basic Authentication. However, the User Name and Password must always be provided. The User Name and Password must be separated by a colon and written in a header in Base64 encoding as follows:

Authorization

Basic <base64code>

Since Basic Authentication always requires the transmission of User Names and Passwords, authentication using Bearer Tokens is recommended for a secure workflow. To obtain such a token, a POST request must first be sent to the IDM (Identity Management) of the LMS. The interface for this is:

POST	`<baseUrl>/idm/oauth/token`

The Payload must be of the media type x-www-form-urlencoded.

grant_type	password
client_id	IDM
client_secret	<Enter ClientSecret from application.properties here>
username	<Enter the username of the person in the LMS who is authorised to access the report here>
password	<Enter the password of the person in the LMS who is authorised to access the report here>

Accordingly, the following Header is still necessary:

Content-Type

application/x-www-form-urlencoded

If the request is successful, a response with status 201 Created is returned, which then contains the Bearer Token (access_token) that can be used to authenticate the other interfaces. The expiry (expires_in) specifies how many seconds a token is valid from the time of the request.

CODE

{
  "refresh_token": “eyJra… ",
  "access_token": "eyJra…”,
  "token_type": "Bearer",
  "expires_in": 86400
}

To successfully authenticate at one of the interfaces mentioned above, it is necessary to add the following header:

Authorization

Bearer <access_token>

SOLR Index REST API

A SOLR index is not a traditional database, but rather a read-only data store optimized for search. Data is added through a process called indexing and is then available for lightning-fast queries, especially complex full-text searches. Instead of storing which words appear in a document (document → words), it stores in which documents each individual word appears (word → list of documents).

Core components:

Documents: The index consists of a collection of documents. Each document is a single unit of information to be searched (e.g., a course, a catalog, a user profile).
Fields: Each document consists of fields that contain the actual data. Fields are like columns in a database table (e.g., title, description). Each field has a defined data type (text, number, date, etc.).
Schema: The structure of the documents and fields (field names, data types, analysis rules) is defined by a schema.

The indexing process is asynchronous. It is crucial to understand that changes to a SOLR index (adding, updating, or deleting documents) are not immediately visible in the search results.

SOLR Index in the LMS

In the LMS, settings for the available SOLR indexes can be configured in the navigation item of the same name. A corresponding index can be activated and deactivated, and for some indexes, the number of past months of data to be used for indexing can also be set.

SOLR Index Interface

The LMS provides the following endpoint for querying indexed data from SOLR:

POST

<baseUrl>/solrsrv/<index>/query

Documents of a specific index

As with the Report API, authentication is performed via an authorization header using either basic or bearer authentication.

Parametrisation of SOLR Data

For complex SOLR queries with numerous parameters, using a POST request is a cleaner alternative to GET. While GET can quickly make the URL confusing and difficult to read, the query parameters in a POST request can be clearly outsourced to the request body. To ensure that SOLR interprets these parameters correctly, it is crucial to set the content type header to application/x-www-form-urlencoded.

Response Type

The response type can be set using the wt parameter. If no response type is specified via the parameter, the response is always in JSON format with indentation for better readability. SOLR supports a variety of response types:

wt=xml	Returns a response in XML format
wt=json	Returns a response in JSON format
wt=csv	Returns the results as CSV (comma-separated values)
wt=python	Returns a response that can be evaluated as a Python data structure
wt=ruby	Returns a response that can be evaluated as a Ruby data structure
wt=php	Returns a serialised PHP response

To make a response more readable, the output in JSON or XML format can be formatted with indentations using the following parameter:

indent=true

Formats the JSON or XML output with indentations to make it more readable for humans

Filter

q-Parameter

The q parameter (for ‘query’) forms the main search. It finds relevant documents and ranks them according to relevance (score). The most basic search involves entering one or more words. SOLR then searches the configured default search field (often a combined field consisting of title, description, etc.) for documents containing these words. Only one q parameter can be included in a query:

`q=<fieldname>:<value>`	Filters by documents that contain the value <value> in the field <fieldname>
`q=“<value1> <value2>“`	Filters for documents containing the phrase ‘<value1> <value2>’
`q= <value1> AND <value2>`	Filters for documents that contain both <value1> and <value2>
`q=<value1> OR <value2>`	Filters by documents that contain either <value1> or <value2>
`q=<value1> NOT <value2>`	Filters for documents that contain <value1> and do not contain <value2>
`q=(<value1> OR <value2>) AND <value3>`	Grouping filter conditions using brackets
`q=te?t`	Filter with a single wildcard, filters by text, test, for example
`q=<value1>*`	Filter with wildcard (multiple), filters by <value1> and all documents containing a word beginning with <value1>
`q=:`	All documents, equivalent to SELECT * FROM in SQL

fq-Parameter

The fq parameter (filter query) can be used to efficiently limit search results. The main advantage lies in performance, as the results of filter queries are cached. Although q and fq use identical syntax, they serve fundamentally different purposes. The fq parameter limits the documents that are eligible for the main search. Unlike the q parameter, a query can contain multiple fq parameters. Support for data types other than text provides additional filtering options:

fq=<fieldname>:[<value> TO *]

Filters by documents that contain the value <value> or higher in the field <fieldname>