Overview of SVC V4.3.1 Performance Statistics

Technote (FAQ)


Question

This document provides an overview of the XML performance statistics available with SVC V4.3.1.

Support for the legacy per-cluster (v_stats and m_stats) performance statistics will be removed in releases after V4.3.x. These legacy statistics files have been superseded by the per-node Nv_stats, Nm_stats and Nn_stats files that are described in this document.

Answer

SVC collects three types statistics: mdisk statistics, vdisk statistics and node statistics. The statistics are collected on a per-node basis. This means that the statistics for a vdisk will be for its usage via that particular node.

Statistics are collected by running the "svctask startstats" command. The "Command Line Interface User's Guide" should be consulted for details of controlling statistics collection and manipulating statistics files.

The collection of performance statistics can be stopped using the svctask stopstats command.

Each node maintains a number of counters. The counters can represent numbers between 0 and (2**64)-1. The counters are reset to zero when a node is booted or reset. Each of these counters is sampled at the end of each sample period

Note that the sampled value is the absolute value of the counter NOT the increase in the value of the counter during the sample period. The application which processes the performance statistics must compare two samples to calculate differences from two separate files. The application does not need to retrieve every sample but must retrieve sufficient samples in order to detect the a wrap in the value of a counters.

An example might be taking the per vdisk total read IO data value (ro="xxx") from the Nv_stats### file for vdisk 50 from the statistics collection event at 1pm, and taking the same value in the Nv_stats### file for vdisk 50 at statistics collection event at 12pm. To given to the total number of read IO operations to vdisk 50 within this 1 hour period.

The information in the Nn_stats####, Nv_stats#### and Nm_stats**** is output in XML (eXtensible Markup Language) format.

Example XML output

Below are examples of the files generated by SVC when performance statistics are being collected. The values in each field below for each of the three statistics files are to be used as an example only.

MDisk Statistics

Nm_stats_106081_081117_002244


VDisk Statistics

Nv_stats_106081_081117_002244

Node Statistics

Nn_stats_106081_081117_002244

Per-node Mdisk Statistics collected

The abbreviation in brackets ()’s is the tag used within the XML file which the statistic is reported in.

Per Mdisk identification

  • Mdisk ID (idx)
  • Mdisk name (id)

Per MDisk Read/write counters
  • Per MDisk Read operations (ro)
  • Per MDisk Write operations (wo)
  • Per MDisk Read blocks (512 bytes) (rb)
  • Per MDisk Write blocks (512 bytes) (wb)

Note: The per node performance statistics do include retries which occur as a result of errors occurring on the SVC to backend controller path. Thus, if a read or write needs to be retried several times, each retry is accounted for.

Per MDisk Cumulative response times

  • Per MDisk Cumulative Read external response time in milliseconds. (re)

This is the cumulative response times for disk reads sent to the managed disk. That is to say that for each SCSI Read command sent to the Managed disk, a timer is started when the command is issued across the fibre channel. When the command completes, the elapsed time since it was started is added to the cumulative counter for this direction.
  • Per MDisk Cumulative write external response time (we)

As above but for write commands.
  • Per MDisk Cumulative Read queued response time (rq)

This is the cumulative response time measured above the queue of commands waiting to be sent to an MDisk because it’s queue is already full.
  • Per MDisk Cumulative write queued response time (wq)

As above but for write commands.
  • Per MDisk cumulative read external response time in microseconds (ure)

These microsecond statistics are calculated in the same way as the millisecond statistics but are reported in microseconds
  • Per MDisk cumulative write external response time in microseconds (uwe)
  • Per MDisk cumulative read queued response time in microseconds (urq)
  • Per MDisk cumulative write queued response time in microseconds (uwq)
  • Per MDisk peak read external response time in microseconds (pre)

This is the peak read response time measured on the fabric during the interval since the last collection. This is not cumulative, it is a single specific value that shows the maximum read response time during the last interval. It requires no manipulation.
  • Per MDisk peak write external response time in microseconds (pwe)

As above for writes.
  • Per MDisk peak read queued response time in microseconds (pro)

This is the peak read response time measured above the queue of commands waiting to be sent to an Mdisk because it’s queue depth is already full. It includes the time spent in the queue and the time spent being executed by the storage controller. This is not cumulative, it is a single specific value that shows the maximum read response time during the last interval of collection. It requires no manipulation.
  • Per MDisk peak write queued response time in microseconds (pwo)

As above for writes

Note: The per node performance statistics do include retries which occur as a result of errors occurring on the SVC to backend controller path. Thus, if a read or write needs to be retried several times, each retry accumulates response time.

Per-node VDisk Statistics collected

The abbreviation in brackets ()’s is the tag used within the XML file which the statistic is reported in.

Note that the file on each node in the cluster will contain statistics only from those Vdisks which are associated with the IO group that the node is a member of:

  • Vdisk id (idx)
  • Vdisk name (id)

Per VDisk Read/write counters.
  • Per VDisk Read operations (ro)
  • Per VDisk Write operations (wo)
  • Per VDisk Read blocks (512 bytes) (rb)
  • Per VDisk Write blocks (512 bytes) (wb)

Per VDisk Cumulative response times (latency)
  • Per VDisk Cumulative Read response time in milliseconds. (rl)

This is the cumulative response time for disk reads sent to the virtual disk. That is to say that for each SCSI Read command received by a node for the Vdisk, a timer is started when the command is received across the fibre channel. When the command completes, the elapsed time since it was started is added to the cumulative counter for this direction.
  • Per VDisk Cumulative write response time in milliseconds (wl)

As above but for write commands. Note that for vdisks which are being used as Metro mirror or Global Mirror secondaries, this statistic is valid and measures the response time into the cache but does not include the time Global Mirror spends waiting for writes “upon which this write depends” to arrive at this (or other) nodes so that consistency can be maintained.
  • Per Vdisk Worst Read response time in microseconds since last statistics collection (rlw)

This is the worst response time for disk reads sent to the virtual disk within the last statistics collection sample period. This value is reset to zero after each statistics collection sample.
  • Per Vdisk Worst Write response time in microseconds since last statistics collection (wlw)

This is the worst response time for disk writes sent to the virtual disk within the last statistics collection sample period. This value is reset to zero after each statistics collection sample.
  • Per Vdisk Cumulative transfer response time in microseconds (xl)

This is the cumulative transfer latency, coalesced for both read and write, and excluding status only transfers. This is the time it takes for a host to respond to a transfer ready notification from the node (for a read) or for the time taken for the host to send the write data when we have sent transfer ready to notify the host that we are ready to receive the data. This is provided as an aid to debug a slow host, or poorly performing fabric that can show which vdisks in particular are experiencing fabric / host based latency. This can be subtracted from the vdisk response time average to gain a vague idea of the average SVC to disk portion of the latency

Note: The rl/wl accumulator statistics are reported in milliseconds, the rlw/wlw reset to zero statistics are reported in microseconds.

Per Vdisk Global Mirror Statistics

For versions of SVC which don't support the Global Mirror feature (before 4.1.1), the three statistics below will report values as zero, regardless of whether the vdisk is part of a metro mirror relationship or not.

For versions which do support Global Mirror (4.1.1 and onwards), the statistics will report values if the vdisk is part of a Metro or Global Mirror relationship.

Name in ()’s is the XML tag name.

  • Per Vdisk Cumulative total number of overlapping writes (gwo)

This is the total number of write IO requests received by Global Mirror on the primary that have overlapped. An overlapping write is a write where the LBA range of the request collides with another outstanding write IO to the same LBA range and the write is still outstanding to the secondary site.

  • Per Vdisk Cumulative total number of fixed or unfixed overlapping writes. (gwot)

When all nodes in all clusters are running at least 4.3.1, this records the total number of write IO requests
received by Global Mirror on the primary that have overlapped. An overlapping write is a write where the LBA range of the request collides with another outstanding write IO to the same LBA range and the write is still outstanding to the secondary site. When any nodes in either cluster are running less than 4.3.1, this value does not increment
  • Per Vdisk Per Relationship total Secondary writes that have been issued (gws)

This is the total number of write IO requests that have been issued to the secondary site.
  • Per Vdisk Per Relationship Cumulative Secondary write latency in milliseconds (gwl)

This statistic accumulates the cumulative secondary write latency for this vdisk. It is based on the time between the primary IO completing and secondary IO completing back, having been committed into the fast write cache at the secondary site. It is possible for the secondary IO to complete back prior to the primary IO completing, in this case, zero is added to the cumulative latency. This cumulative latency is calculated in the same way to the other cumulative latencies.
If this vdisk is being used as a Global Mirror secondary, this statistic accumulates the time spent waiting for writes “upon which this write depends” to complete. It does not include the cache latency - this is captured by the "wl" statistic.

Using the gws and gwl statistics allows a customer to calculate the RPO (Recovery Point Objective) time for the vdisk prior to the failure)

Per-node Vdisk Cache Statistics

The abbreviation in brackets ()’s is the tag used within the XML file which the statistic is reported in.

All values reported are unsigned 64-bit values.

The cache statistics have the following policies:

  • A read/write is counted as a cache miss unless the read/write obtained a hit on all the sectors the entire read/write data for that particular track. Multiple track I/O requests can obtain a separate cache hits and/or cache misses.
  • A write received from a component above, where the downstream write is performed by the partner node in the IO group will only increment statistics on the node that received the write IO request, not on the node that performed the downstream write.
  • A destage operation is only counted as a destage for flushing and destaging operations. e.g. A write through or flush through write does not count as a destage even if the track is still dirty.

The following statistics are reported:

1. Track reads (ctr)

The number of track reads received from components above. e.g. a single read spanning 2 tracks will be counted as two.

2. Track read sector count. (for 1) (ctrs)

The total number of sectors read for reads received from components above. (for 1)

3. Track writes (ctw)

The number of track writes received from components above. e.g. a single write spanning 2 tracks will be counted as two.

4. Track write sector count. (for 3) (ctws)

The total number of sectors written for writes received from components above. (for 3)

5. Tracks prestaged. (ctp)

The number of track stages initiated by the cache, that is prestage reads. e.g. a two track prestage will be counted as two.

6. Prestage sector count. (for 5) (ctps)

The total number of staged sectors initiated by the cache. (total prestage read sector count for 5)

7. Track read cache hits. (ctrh)

The number of track reads received from components above that have been treated by cache as a total hit on prestage or non-prestage data. e.g. a single read spanning 2 tracks where only 1 of the tracks obtained a total cache hit, will be counted as one.

A total cache hit is where the entire track read transferred data out of the cache without doing any downstream read as part of that IO request.

8. Track read cache hits sector count.(for 7)(ctrhs)

The total number of sectors read for reads received from components above that have obtained total cache hits on prestage or non-prestage data. (for 7)

9, Track read cache hits on any prestaged data. (ctrhp)

The number of track reads received from components above that have been treated as cache hits on any prestaged data. e.g. a single read spanning 2 tracks where only 1 of the tracks obtained a total cache hit on prestaged data, will be counted as one. A cache hit that obtains a partial hit on prestage and non-prestage data will still contribute to this value.

10. Read cache hits on prestaged data sector count. (for 9)(ctrhps)

The total number of sectors read for reads received from components above that have obtained cache hits on any prestaged data

11. Track read cache misses ( = (1)-(7) ) (ctrm)

The number of track reads received from components above that have cache misses. e.g. a single two track read where both tracks were treated as a cache miss will count as two.

A cache miss includes a partial cache hit. The SVC cache does not have the concept of a partial cache hit.

12. Track read cache misses sector count ( = (2)-(8) ) (for 11) (ctrms)

The total number of sectors read for reads received from components above that have cache misses.

13. Track destages (ctd)

The number of cache track initiated writes submitted to components below as a result of a vdisk cache flush or destage operation on a track basis.

14. Track destage sector count. (for 13) (ctds)

The total number of sectors written for cache initiated track writes.

15. Track writes in flush through mode. (ctwft)

The number of track writes received from components above and processed in flush through write mode. e.g. a single two track write will count as two.

16. Track writes in flush through mode sector count. (for 15) (ctwfts)

The total number of sectors written for writes received from components above and processed in flush through write mode. (for 15)

17. Track writes in write through mode. (ctwwt)

The number of track writes received from components above and processed in write through write mode. e.g. a single two track write will count as two.

18. Track writes in write through mode sector count. (for 17) (ctwwts)

The total number of sectors written for writes received from components above and processed in write through write mode.

19. Track writes in fast write mode (ctwfw)

The number of track writes received from components above and processed in fast write mode. e.g. a single two track write will count as two.

20. Track writes in fast write mode sector count. (for 19) (ctwfws)

The total number of sectors written for writes received from components above and processed in fast write mode.

21. Track writes in fast write mode that were written in write through due to the lack of memory. (ctwfwsh)

22. Track writes in fast write mode that were written in write through due to the lack of memory, sector count. (for 21) (ctwfwshs)

23. Track write misses on dirty data (ctwm)

When in fastwrite mode, the number of track writes received from components above where some of the sectors in the write data resulted in new dirty data being generated in the cache.
A partial write cache hit counts as a write cache miss. Low resource writes (see 21/22) will not contribute to this counter.

24. Track write misses on dirty data, sector count. (for 23) (ctwms).

When in fast write mode, the total number of sectors received from components where some of the sectors in the write data resulted in new dirty data being generated in the cache.

25. Track write hits on dirty data. (ctwh)

When in fast write mode, the number of track writes received from components above where every sector in the track obtained a write hit on already dirty data in the cache.

For a write to count as a total cache hit, the entire track write data must already be marked in the write cache as dirty.

26. Track write hits on dirty data, sector count. (for 25) (ctwhs)

When in fast write mode, the total number of sectors received from components above where every sector in the track obtained a write hit on already dirty data in the cache.
Low resource writes (see 21/22) will not contribute to this counter.

27. Quantity of write cache data in sectors (cm)

The number of sectors of modified/dirty data held in the cache.

28. Quantity of cache data in sectors (cv)

The number of sectors of read and write cache data held in the cache.

Per-node Node Statistics collected

CPU Usage Counter

This statistic reports the pseudo-CPU utilisation. On an SVC, if we reported the CPU utilisation that is reported by the operating system, then it would show 100% because SVC polls the fibre-channel if there is nothing to do. What this statistic reports is the amount of the time the processor has spent polling waiting for work versus actually doing work.

In the file the statistic which is reported is:

The number of busy milliseconds since the node was reset (cpu_busy)

Thus this figure accumulates from zero. For a totally idle node the number will stay at zero (this would never happen) for a totally busy node the number will accumulate at 1000 per second.

Per Port counters and other node statistics

The abbreviation in brackets ()’s is the tag used within the XML file which the statistic is reported in

Note that there will be as many entries for the port statistics in the file as there are ports on the SVC node. Thus on a model 4F2, 8F2 or 8F4 node there will always be four sets of counters, one for each of the four ports. Note that since the counters are zeroed when the node is reset or rebooted it is unlikely that client code needs to take counter wrap into account but client code will need to take into account that counters can be reset by a node reset.

All values reported are unsigned 64-bit values.

For each of the 4 ports in each SVC node. SVC will report the following statistics:

  • port id (from customer view) (id)
  • World wide port name (wwpn)
  • Bytes transmitted to hosts (initiators) (hbt)
  • Bytes transmitted to controllers (targets) (cbt)
  • Bytes transmitted to other SVC nodes in the same cluster (lnbt)
  • Bytes transmitted to other SVC nodes in other clusters (rmbt)
  • Bytes received from hosts (initiators) (hbr)
  • Bytes received from controllers (targets) (cbr)
  • Bytes received from other SVC nodes in the same cluster (lnbr)
  • Bytes received from other SVC nodes in other clusters (rmbr)
  • Commands initiated to hosts (initiators) (always zero but provided for completeness) (het)
  • Commands initiated to controllers (targets) (cet)
  • Commands initiated to other SVC nodes in the same cluster (lnet)
  • Commands initiated to other SVC nodes in other clusters (rmet)
  • Commands received from hosts (initiators) (her)
  • Commands received from controllers (targets) (probably always zero but provided for completeness) (cer)
  • Commands received from other SVC nodes in the same cluster (lner)
  • Commands received from other SVC nodes in other clusters (rmer)

To be consistent with the customer view of the ports, the naming of the ports should be index from “1”, where “1” is the left hand most port as viewed from the rear of the cluster.

For each other SVC node visible on the fabric

  • node name (id)
  • cluster name (cluster)
  • node unique ID (node_id)
  • cluster unique ID (cluster_id)
  • number of messages/bulk data received (ro)
  • number of messages/bulk data sent (wo)
  • number of bytes received (rb)
  • number of bytes sent (wb)
  • accumulated receive latency including inbound queue time (rq)

This is the latency from the time a command arrives at the node communication layer to the time cache gives completion for it.
  • accumulated receive latency excluding inbound queue time (re)

This is the latency experienced by node communication layer from the time an IO is queued to cache until the time cache gives completion for it.
  • accumulated send latency including outbound queue time (wq)

This is the time from when the node communication layer receives a message, including the queuing time waiting for resources, the time to send the message to the remote node and the time taken for the remote node to respond to say that the message has arrived.
  • accumulated send latency excluding outbound queue time (we)

This is the time from when the node communication layer issues a message out onto the fibre channel until the node communication layer receives notification that the message has arrived. It is as near as possible to the round trip “wire” delay between the two nodes.

Cluster Debug Statistic Collection

The following stats are provided primarily for debug of suspected fabric issues.

  • Link failure count - number of since last node reset (cumulative) (lf)
  • Loss-of-synchronization count - number of since last node reset (cumulative) (lsy)
  • Loss-of-signal count - number of since last node reset (cumulative) (lsi)
  • Primitive Sequence Protocol Error Count - number of since last node reset (cumulative) (pspe)
  • Invalid transmission word count - number of since last node reset (cumulative) (itw)
  • Invalid CRC count - number of since last node reset (cumulative) (icrc)
  • Zero buffer-buffer credit timer. Number of microseconds for which the port has been unable to send frames due to lack of buffer credit since the last node reset(10 microsecond granularity) (cumulative) (bbcz)

Rate this page:

(0 users)Average rating

Document information


More support for:

SAN Volume Controller
V4.3.x

Version:

4.3.1

Operating system(s):

Platform Independent

Reference #:

S1003432

Modified date:

2010-09-17

Translate my page

Machine Translation

Content navigation