Quick Analysis Guide
This articles addresses on-premise environments with performance issues and the guide provides detailed steps for analysis.
In case of a system freeze (hang-up) or an overall bad performance, please follow the steps described below. In most cases, following the simple analysis steps is enough for identifying the cause of the problem. These steps have to be performed before the system is restarted.
Finally, please raise a ticket in our Service Desk and attach the log files and results of the analysis steps to it.
Important Parameter Checks
Java Parameters
The Java max heap parameter needs to be set to a sufficiently high value (>= -Xmx2700m). If too low, you will see extensive Full GC activity in the GC log which should always be configured (e.g. for Oracle JDK: -verbose:gc -XX:+PrintGCDetails -Xloggc:<path to log directory>/gc.log).
For the maximum heap size, we recommend at least 2700m. Depending on your activity and number of concurrent users, it might be necessary to set a significant higher value (for further information see System Requirements).
Tomcat Parameters
The maxThreads parameter in server.xml for the connector needs to be set to at least the number of expected concurrent ILS users. By default there is no such parameter setting and the value is 200. To increase it to e.g. 1200 add maxThreads=”1200” in the Connector tag.
As a rule of thumb the maxActive parameter in the database JDBC configuration should be set to half of maxThreads. Having maxActive too small can easily lead to a system freeze. On Oracle the processes and on PostgreSQL the MAX_CONNECTIONS parameter should be set a little higher than the sum of maxActive of all nodes to have some spare connections for direct DBA access.
Running ILS with MS SQL Server
Ad-hoc Analysis
Please execute the following steps before the ILS application server is restarted.
Step 1: Querying the database
Please execute the select statement on the SQL Server master database and save the output. In case the instance is hosting also databases from other applications, then you might activate the commented database filter “and db_name(s.database_id) = ” to limit the output on the ILS database.
select db_name(s.database_id) as dbname, c.session_id, qt.text,
case when rq.total_elapsed_time is null then 0 else rq.total_elapsed_time end as total_elapsed_time,
rq.cpu_time, rq.status, rq.blocking_session_id, rq.wait_type, rq.wait_time,
rq.wait_resource, rq.open_transaction_count, rq.transaction_isolation_level
from sys.dm_exec_connections c CROSS APPLY sys.dm_exec_sql_text(c.most_recent_sql_handle) as qt
left outer join sys.dm_exec_requests rq on c.session_id = rq.session_id,
sys.dm_exec_sessions s
where c.session_id = s.session_id and s.database_id > 4 --and db_name(s.database_id) = <ils_db_name>
order by s.database_id, total_elapsed_time desc;
Please make sure that the output of column “text” is not cut off after 256 characters. One way to assure this, is to execute the statement in SQL Server Management Studio and to save the output from the Results-tab with copy&paste. Please also execute the SQL statements of section 2.2. However, these statements could also be executed after the application server is restarted.
Step 2: Checking the CPU Utilisation
Take a look at the CPU utilisation on both the ILS application and database server (e.g. using taskmgr on Windows). Make a note of it.
Step 3: Saving the Log Files
Save the log files clix.log and gc.log as well as the log file of the web server (e.g. access.log with Apache) and all log files of the servlet engine.
When using TOMEE as servlet engine look for the string “Increase max Threads” within the TOMEE log file stdout log. A message like “All threads (200) are currently busy, waiting. Increase maxThreads (200)” means that the system freeze has been caused by a too low value of the maxThreads TOMEE parameter. Then increase the value of this parameter.
Step 4: Taking a Java Thread Dump
In case of a system freeze with unclear cause try to take a Java Thread Dump (Java Stack Trace).
jstack <pid> > threaddump.txt
where <pid> needs to be substituted by the process ID of the java process. In case Tomcat is running as a Windows service you may experience an error message like “not enough storage is available to process this command”. In this case jstack needs to be execute with help of PSTools command psexec as shown here:
C:\PSTool> psexec –s c:/Programs/…/bin/jstack <pid> > threaddump.txt
On Linux / Unix you can create a thread dump also with kill -3 on the process ID of the Java process and the thread dump will be written to one of the application server log files.
Post-Analysis
In case of bad performance the following analysis step can also be performed after the ILS application server was restarted, as long as the SQL Server was not yet restarted.
Querying the Database
Please execute the following SQL statements and save the output:
select name, snapshot_isolation_state, is_read_committed_snapshot_on from sys.databases;
select * from sys.dm_os_sys_info;
SELECT TOP 100 total_elapsed_time / 1000 as total_elapsed, total_worker_time / 1000 as total_cpu,
total_elapsed_time / qs.execution_count / 1000 as averaged_elapsed, total_physical_reads as readIOs, qs.execution_count as exec_count,
SUBSTRING (qt.text,qs.statement_start_offset/2+1, (CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 ELSE qs.statement_end_offset END - qs.statement_start_offset)/2 + 1) as query,
DB_NAME(qt.dbid) as dbname
FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY total_elapsed_time DESC;
Please make sure that the output of the columns is not cut off after 256 characters. One way to assure this is to execute the statements in SQL Server Management Studio one by one and to save the output from the Results-tab with copy&paste.
Running ILS with Oracle Database
Ad-hoc Analysis
The following actions must be performed before the ILS application server is restarted.
Step 1: Querying the Database
Execute the following SQL statement under ORACLE user “system” and save the output.
set linesize 10000;
set pagesize 1000;
select s.server, substr(s.username,1,20) as username, s.sid, (select l1.sid from v$lock l1, v$lock l2 where l2.sid = s.sid and l1.block = 1 and l2.request > 0 and l1.id1 = l2.id1 and l1.id2 = l2.id2) as bsid, s.status, 0 as aflag, to_char(substr(sql_fulltext,1,4000)) as sql, to_char(substr(sql_fulltext,4001,4000)) as sql_part2, to_char(substr(sql_fulltext,8001,4000)) as sql_part3
from v$session s, v$sqlarea t, all_users u
where s.sql_hash_value = t.hash_value and s.sql_address = t.address and u.oracle_maintained = 'N' and u.user_id = s.user#
union
select s.server, substr(s.username,1,20) as username, s.sid, null as bsid, s.status, 1 as aflag, to_char(substr(sql_fulltext,1,4000)) as sql, to_char(substr(sql_fulltext,4001,4000)) as sql_part2, to_char(substr(sql_fulltext,8001,4000)) as sql_part3
from v$session s, v$sqlarea t, all_users u
where s.sql_hash_value = 0 and s.prev_hash_value = t.hash_value and s.prev_sql_addr = t.address and u.oracle_maintained = 'N' and u.user_id = s.user#
union
select 'summary', 'zsort' as username, (select current_utilization from v$resource_limit where resource_name = 'sessions'), (select max_utilization from v$resource_limit where resource_name = 'sessions'), (select limit_value from v$resource_limit where resource_name = 'sessions'), (select count(*) from v$session) as sid, '' as sql_part1, '' as sql_part2, '' as sql_part3
from dual
order by username, aflag;
Step 2: Checking the CPU Utilisation
Take a look at the CPU utilization on both the ILS application and database server (e.g. using taskmgr on Windows). Make a note of it.
Step 3: Saving the Log Files
Save the log files clix.log and gc.log as well as the log file of the web server (e.g. access.log with Apache) and all log files of the sevlet engine.
When using TomEE as servlet engine look for the string “Increase maxThreads” within the TomEE log file stdout log. A message like “All threads (200) are currently busy, waiting. Increase maxThreads (200)” means that the system freeze has been caused by a too low value of the maxThreads TomEE parameter. Then, increase the value of this parameter.
Step 4: Taking a Java Thread Dump
In case of a system freeze with unclear cause try to take a Java Thread Dump (Java Stack Trace).
jstack <pid> > threaddump.txt
where <pid> needs to be substituted by the process ID of the java process. In case Tomcat is running as a Windows service you may experience an error message like “not enough storage is available to process this command”. In this case jstack needs to be execute with help of PSTools command psexec as shown here:
C:\PSTool> psexec –s c:/Programs/…/bin/jstack <pid> > threaddump.txt
On Linux / Unix you can create a thread dump also with kill -3 on the process ID of the Java process and the thread dump will be written to one of the application server log files.
Post-Analysis
In case of bad performance the following analysis steps can be performed at any time.
Step 1: Querying the Database
Execute the following SQL statement under ORACLE user “system” and save the output.
set linesize 1000;
set pagesize 1000;
select * from
(select parsing_schema_name, round(elapsed_time/1000) as elaptotal, round(elapsed_time/executions/1000) as elapexec,
round(cpu_time/1000) as cputotal, round(cpu_time/executions/1000) as cpuexec, round(user_io_wait_time/1000) as iowaittotal,
buffer_gets, disk_reads, executions, last_active_time,
to_char(substr(sql_fulltext,1,4000)), to_char(substr(sql_fulltext,4001,4000)), to_char(substr(sql_fulltext,8001,4000))
from v$sql
where parsing_schema_name not in ('SYS','SYSTEM','SYSMAN','MDSYS','CTXSYS','EXFSYS','DBSNMP','DEC_REP','XDB') and executions > 0
order by elaptotal desc)
where rownum <= 100;
Step 2: Elementary Oracle Parameters and Statistics
Execute the following SQL statement under ORACLE user “system” and save the output.
set pagesize 20;
select substr(name,1,30) as name, substr(value,1,30) as value from v$parameter
where name in ('processes', 'shared_pool_size', 'memory_target', 'sga_target', 'pga_aggregate_target', 'optimizer_features_enable')
union
select 'db_cache_size_MB' as name, cast(current_size as varchar(30)) from v$buffer_pool
union
select substr(name,1,30) as name, cast(value as varchar(30)) from v$sysstat
where name in ('db block gets', 'consistent gets', 'physical reads')
union
select substr(name,1,30), cast(value as varchar(30)) from v$sysstat
where name like 'sort%';
Step 3: Oracle SQL-Optimizer Statistics
In case the Oracle SQL Optimizer statistics are not regularly updated, enter the following command in the Oracle SQL utility (Replace ‘ILSDB’ by the Oracle user of the ILS database):
exec dbms_stats.gather_schema_stats('ILSDB',null,false,'FOR ALL COLUMNS SIZE 1', null,'DEFAULT',true);
Running ILS with PostgreSQL Database
Ad-hoc Analysis
The following actions must be performed before the ILS application server is restarted.
Step 1: Querying the Database
Execute the following SQL statement on the ILS database and save the output.
In case of a PostgreSQL version smaller than 13 remove column leader_pid from the select list as it does not yet exist.
select datname, pid, leader_pid, state, xact_start as ta_start, query_start as q_start, wait_event_type, wait_event,
case when wait_event_type = 'Lock' then (select string_agg(cast(pl2.pid as varchar(9)),',') from pg_locks pl, pg_locks pl2 where pa.pid = pl.pid and pl.locktype = pl2.locktype and pl.relation is not distinct from pl2.relation and pl.page is not distinct from pl2.page and pl.tuple is not distinct from pl2.tuple and pl.virtualxid is not distinct from pl2.virtualxid and pl.transactionid is not distinct from pl2.transactionid and pl.classid is not distinct from pl2.classid and pl.objid is not distinct from pl2.objid and pl.objsubid is not distinct from pl2.objsubid
and pl.pid != pl2.pid and not pl.granted and pl2.granted) else null end as blocking_pid,
query
from pg_stat_activity pa
where datname = (select current_database())
order by state, query_start;
Step 2: Checking the CPU Utilisation
Take a look at the CPU utilization on both the ILS application and database server (e.g. using taskmgr on Windows). Make a note of it.
Step 3: Saving the Log Files
Save the log files clix.log and gc.log as well as the log file of the web server (e.g. access.log with Apache) and all log files of the sevlet engine.
When using TomEE as servlet engine look for the string “Increase maxThreads” within the TomEE log file stdout log. A message like “All threads (200) are currently busy, waiting. Increase maxThreads (200)” means that the system freeze has been caused by a too low value of the maxThreads TomEE parameter. Then, increase the value of this parameter.
Step 4: Taking a Java Thread Dump
In case of a system freeze with unclear cause try to take a Java Thread Dump (Java Stack Trace)
jstack <pid> > threaddump.txt
where <pid> needs to be substituted by the process ID of the java process. In case TomEE is running as a Windows service you may experience an error message like “not enough storage is available to process this command”. In this case jstack needs to be execute with help of PSTools command psexec as shown here:
C:\PSTool> psexec –s c:/Programs/…/bin/jstack <pid> > threaddump.txt
On Linux / Unix you can create a thread dump with kill -3 on the process ID of the Java process and the thread dump will be written to one of the application server log files.
Post-Analysis
In case of bad performance the following analysis steps can be performed at any time.
Step 1: Querying the Database
Execute the following SQL statement on the ILS datbase and save the output.
In case of a PostgreSQL version smaller than 13 you need to change column names total_exec_time and mean_exec_time to total_time and mean_time.
select query, round(total_exec_time::numeric, 2) AS total_time, calls, round(mean_exec_time::numeric, 2) AS mean,
round((100 * total_exec_time / sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_total_time
from pg_stat_statements
order by total_time desc
limit 100;
In order to be able to do this the following PostgreSQL configuration parameter must have been set
shared_preload_libraries=‘pg_stat_statements’
and on the database the following statement was executed after creating it (see document ILS_Installation_Win_TomEE_IIS_EN.pdf section 4.1.3.1 and 4.1.3.2):
create extension pg_stat_statements;
Step 2: Elementary PostgreSQL Parameters
Execute the following SQL statement under ORACLE user “system” and save the output.
select substring(name,1,30) as name, substring(setting,1,30) as value
from pg_settings
where name in ('max_connections', 'shared_buffers', 'effective_cache_size','work_mem');