Monitoring IBM i instances

Supported versions

Supported IBM i platforms are: 7.2, 7.3, 7.4.

Note: Currently, we only support remote monitoring of IBM i instances.

Configuration

To start monitoring IBM i instances, you need to configure the following fields in the agent configuration file <agent_install_dir>/etc/instana/configuration.yaml:

com.instana.plugin.ibmiseries:
  enabled: true
  remote: # multiple configurations supported
    - host: 'remote.host-1.com'
      user: 'username'
      password: 'password'
      availabilityZone: 'IBM i Remote Monitoring'
      poll_rate: 15 # seconds
    - host: 'remote.host-2.com'
      user: 'username'
      password: 'password'
      availabilityZone: 'IBM i Remote Monitoring'
      poll_rate: 15 # seconds

The configured remote IBM i instance will then be shown as a separate box in the specified availabilityZone.

Note: Currently, the user specified within the user configuration parameter should have QSECOFR authority.

Metrics collection

Configuration data

  • Host name
  • OS Version
  • Total CPU
  • Total Memory
  • Configured CPU
  • Configured Memory
  • Partition ID
  • Number of partitions
  • Restricted state

Performance metrics

System Metrics

Metric Description Granularity
CPU Rate The average CPU rate expressed as a percentage where 100% indicates the processor is running at its nominal frequency. A value above or below 100% indicates how much the processor has been slowed down (throttled) or speeded up (turbo) relative to the nominal frequency for the processor model. For instance, a value of 120% indicates the processor is running 20% faster against its nominal speed. 15 seconds
Average CPU Utilization The average CPU utilization for all the active processors. 15 seconds
Min CPU Utilization The CPU utilization of the processor that reported the minimum amount of CPU utilization. 15 seconds
Max CPU Utilization The CPU utilization of the processor that reported the maximum amount of CPU utilization. 15 seconds
Active Jobs The number of jobs active in the system (jobs that have been started, but have not yet ended), including both user and system jobs. 15 seconds
Interactive Jobs The percentage of interactive performance assigned to this logical partition. This value is a percentage of the total interactive performance available to the entire physical system. 15 seconds
Total Jobs The total number of user and system jobs that are currently in the system. The total includes: all jobs on job queues waiting to be processed, all jobs currently active (being processed), all jobs that have completed running but still have output on output queues to be produced. 15 seconds
Max Jobs The maximum number of jobs that are allowed on the system. When the number of jobs reaches this maximum, you can no longer submit or start more jobs on the system. The total includes: all jobs on job queues waiting to be processed, all jobs currently active (being processed), all jobs that have completed running but still have output on output queues to be produced. 15 seconds
Used Auxiliary Storage Pool The percentage of the system storage pool (ASP number 1) currently in use. 15 seconds
Capacity of Auxiliary Storage Pool The storage capacity of the system auxiliary storage pool (ASP number 1) in millions of bytes. This value represents the amount of space available for storage of both permanent and temporary objects. 15 seconds
Current Temporary Storage The current amount of storage, in millions of bytes, in use for temporary objects. 15 seconds
Maximum Temporary Storage Used The largest amount of storage, in millions of bytes, used for temporary objects at any one time since the last IPL. 15 seconds
Active Threads The number of initial and secondary threads in the system (threads that have been started, but have not yet ended), including both user and system threads. 15 seconds
Total Spool Space The total spool space consumed by the output queue in bytes. 15 seconds

Active Memory Pool Metrics

Metric Description Granularity
Storage Used The amount of main storage, in megabytes, in the pool. 15 seconds
Storage Reserved The amount of storage, in megabytes, in the pool reserved for system use (for example, for save/restore operations). 15 seconds
Storage Defined The size of the pool, in megabytes, as defined in the shared pool, subsystem description, or system value QMCHPOOL. Contains the null value for a pool without a defined size. 15 seconds
Active Threads The number of threads currently using the pool. 15 seconds
Ineligible Threads The number of ineligible threads in the pool. 15 seconds
Max Threads The maximum number of threads that can be active in the pool at any one time. 15 seconds

Output Queue Metrics

Metric Description Granularity
Queue Name The name of the output queue. 15 seconds
Library Name The name of the library that contains the output queue. 15 seconds
Status The status of the output queue. 15 seconds
Files in Queue The total number of spooled files currently on this output queue. 15 seconds
Writer Job Name The qualified job name of the writer job. If more than one writer is started, this is the name of the first writer. Contains the null value if a writer job is not started for this queue. 15 seconds
Writer Job Status The status of the writer job. If more than one writer is started, this is the status of the first writer. 15 seconds

Top Spool Space Consumption

Top 20 users consuming the spool space

Metric Description Granularity
User The name of the user profile that produced the Spool files. 15 seconds
Spool Space The size of the users spooled files, in bytes. 15 seconds

Top Active Jobs

Top 20 active jobs currently running in the system

Metric Description Granularity
Job Name The qualified job name. 15 seconds
User Name The user profile under which the initial thread is running at this time. For jobs that swap user profiles, this user profile name and the user profile that initiated the job can be different. 15 seconds
Elapsed CPU Percentage The percent of processing unit time attributed to this job during the measurement time interval. 15 seconds
Temporary Storage The size of the users spooled files, in kilobytes. 15 seconds
Job Status The status of the initial thread of the job. 15 seconds
Job Type Type of active job. 15 seconds
Thread Count The number of active threads in the job. 15 seconds

Auxiliary Storage Pools

Information about auxiliary storage pools (ASPs).

Metric Description Granularity
ASP Number A unique identifier for an ASP. Possible values are 1 through 255. 15 seconds
Device Description Name The name of the device description that brought the independent ASP (IASP) to varyon/active state. 15 seconds
ASP Type The use that is assigned to the ASP. 15 seconds
ASP State The device configuration status of an ASP. 15 seconds
Number Of Disk Units The total number of disk units in the ASP. If mirroring is active for disk units within the ASP, the mirrored pair of units is counted as one. 15 seconds
Total Capacity The total number of used and unused megabytes in the ASP. A special value of -2 is returned if the size of this field is exceeded. 15 seconds
Total Capacity Utilization Utilization Percentage of the Total Capacity in the ASP. 15 seconds
Protected Capacity The total number of used and unused megabytes in the ASP that are protected by mirroring or device parity. A special value of -2 is returned if the value was too big to return. Contains the null value if the capacity cannot be determined. 15 seconds
Protected Capacity Utilization Utilization Percentage of the Protected Capacity in the ASP. 15 seconds
Unprotected Capacity The total number of used and unused megabytes in the ASP that are not protected by mirroring or device parity. A special value of -2 is returned if the value was too big to return. Contains the null value if the capacity cannot be determined. 15 seconds
Unprotected Capacity Utilization Utilization Percentage of the Unprotected Capacity in the ASP. 15 seconds

Active Subsystems

Information about Active Subsystems

Metric Description Granularity
Name The name of the subsystem about which information is being returned. 15 seconds
Library Name The name of the library in which the subsystem description resides. 15 seconds
Active Jobs The number of jobs currently active in the subsystem. This number includes held jobs but excludes jobs that are disconnected or suspended because of a transfer secondary job or a transfer group job. If STATUS is INACTIVE, returns 0. 15 seconds
Max Active Jobs The maximum number of jobs that can run or use resources in the subsystem at one time. Contains the null value if the subsystem description specifies *NOMAX, indicating that there is no maximum. 15 seconds
Description The text description of the subsystem description. 15 seconds

Job Queue

Information about job queue.

Metric Description Granularity
Job Queue Name The name of the job queue. 15 seconds
Job Queue Library The name of the library that contains the job queue. 15 seconds
Subsystem Name The name of the subsystem that can receive jobs from this job queue. Contains the null value if this job queue is not associated with an active subsystem. 15 seconds
Subsystem Library Name The library in which the subsystem description resides. Contains the null value if this job queue is not associated with an active subsystem. 15 seconds
Number Of Jobs The number of jobs in the queue. 15 seconds
Active Jobs The current number of jobs that are active that came through this job queue entry. Contains the null value if this job queue is not associated with an active subsystem. 15 seconds
Maximum Active Jobs The maximum number of jobs that can be active at the same time through this job queue entry. A value of -1 indicates *NOMAX, no maximum number of jobs is defined. Contains the null value if this job queue is not associated with an active subsystem. 15 seconds
Job Queue Status The status of the job queue. HELD : The queue is held. RELEASED : The queue is released. 15 seconds
Text Description Text that describes the job queue. Contains the null value if there is no text description for the job queue. 15 seconds
Held Jobs The current number of jobs that are in *HELD status. This is the sum of the 10 HELDJOBSPRIORITY_n columns. 15 seconds
Released Jobs The current number of jobs that are in *RELEASED status. This is the sum of the 10 RELEASEDJOBSPRIORITY_n columns. 15 seconds
Scheduled Jobs The current number of jobs that are in *SCHEDULED status. This is the sum of the 10 SCHEDULEDJOBSPRIORITY_n columns. 15 seconds