dfsstat
is a DFS utility program that reports on various
counters that are maintained by the DFS kernel extensions. It used to
be unsupported and was available only via download from a special
"unsupported tools" page, but we're making it part of the official DFS
product on some platforms. Specifically, it is included in DFS 3.1 PTF
3 for AIX 4.3, and will be available for all other platforms in DFS 3.1
PTF 4.
You can run dfsstat
on client or server systems.
It is most commonly run without any arguments, as we do in the example below.
You can type dfsstat -help
to see a list of options.
dfsstat
was originally intended to be a tool for service
personnel (support and development), so the meaning of some of the counters
won't be apparent unless you have some internals knowledge of DFS. Usually,
you will run dfsstat
only as directed by your Support
rep as a means of gathering data about a particular problem. We are making
it part of the official DFS product to speed up diagnosis in cases where
we want customers to run dfsstat
; it's obviously easier to
run if the binary is already on your system and doesn't have to be downloaded
from the web.
Sample output is as follows (this from a DFS server machine):
# /opt/dcelocal/bin/dfsstat
KERNEL RPC
----------
ccalls scalls txpkts rxpkts retrans rxdups oo_pkts
6385 12646 35635 36974 0 0 0
rxfacks txfacks
4811 9794
DFS CLIENT
----------
vn_lkups vn_rdir vn_gattr vn_sattr vn_read vn_write vn_map
124968 5474 9480 1035 5423 6945 20
rd_faults wr_faults pr_faults cachehits inflight rd_waits
2710 2035 0 2710 0 0
lookups fstatus fdata readdir gettokens
1599 16 0 223 2
sstatus sdata reltokens revokes
1041 1138 611 78
DFS SERVER
----------
lookup lkuproot fstatus sstatus fdata sdata
3092 24% 3 0% 20 0% 2447 19% 21 0% 2098 16%
readdir mkdir rmdir create rmfile rename
446 3% 163 1% 156 1% 930 7% 1616 12% 600 4%
link symlink fetchacl storeacl gettoken reletoken
400 3% 400 3% 0 0% 0 0% 63 0% 72 0%
sctx gettime setparam bulkkalive bulkfetchVV bulkfstatus
23 0% 17 0% 0 0% 0 0% 0 0% 0 0%
totalcalls: 12567
-------- FXD TokenProcs counter --------
Total number of threads : n_threads = 2
Number of idle threads : n_idle = 2
Number of calls queued : n_queued = 0
Max. number of calls that could be queued = 400
-------- FXD MainProcs counter ---------
Total number of threads : n_threads = 8
Number of idle threads : n_idle = 8
Number of calls queued : n_queued = 0
Max. number of calls that could be queued = 400
#
(On a DFS client machine, the "DFS SERVER" stats will be reported as all zeroes.)
Meanings of some of the fields are as follows. Note that we can't promise to fully document all the output, since much of it depends on DFS internals knowledge, and Support can't provide on-demand analyses of output from customer machines, but for whatever it's worth, here's what the fields mean:
KERNEL RPC:
-------------
ccalls
The number of RPC client calls made.
scalls
The number of calls received by the RPC. Note even a client
only DFS system can have scalls since it exports a
token revocation server for the DFS file server to issue
token revokes to.
txpkts
The number of RPC packets transmitted including retransmits.
rxpkts
The number of RPC packets received including rxdups and
oo_pkts.
retrans
The number of packets retransmitted. This should
be compared against txpkts. Ratios less than 1 percent
are excellent. Ratios above 5 percent should be investigated.
Ratios above 10 percent undesirable. Look for
causes of network packet loss or server overload.
rxdups
The number of duplicate packets received out of the total
received packets. Duplicated packets are an indication
of network loss of the original sent packet or a loss of
an acknowledgment. Investigate possible causes of network
loss for rxdup to rxpkts ratios above 10 percent.
oo_pkts
The number of out of order packets received. This
can be a sign of network loss or heavy network traffic.
UDP does not guarantee in order delivery of packets.
Ratios of oo_pkts to rxpkts above a few percent should
probably be investigated.
rx_facks
The number of fragment acknowledgments received. This is
associated with DFS data reads and writes which use the
RPC pipe mechanism to stream data between the server and
client.
tx_facks
The number of fragment acknowledgments transmitted. This is
associated with DFS data reads and writes which use the
RPC pipe mechanism to stream data between the server and
client.
DFS CLIENT:
-----------
vn_lkups
The number of lookup vnode operations performed in DFS.
Typically this relates to system calls which take a file
pathname as input. Examples are open() and stat().
vn_rdir
The number of readdir vnode operations performed in DFS.
vn_gattr
The number of getattr vnode operations performed in DFS.
Many system calls which work with files will get file
attributes.
vn_sattr
The number of setattr vnode operations performed in DFS.
System calls like chmod() and utimes() will use this vnode
operation.
vn_read
The number of read vnode operations performed in DFS.
This relates to the read system call. Note that mapped
file I/O will not use the read system call. AIX uses
mapped files for binaries. Also the AIX C compiler and
linker makes use of mapped files.
vn_write
The number of write vnode operations performed in DFS.
This relates to the write system call. Note that
mapped file stores will not go through the write vnode operation.
vn_map
The number of map vnode operations. This is related to
shmat() and mmap() system calls.
rd_faults
The number of read page faults issued by the Virtual
Memory Manager (VMM) to DFS. Note that DFS is integrated
with the AIX VMM. DFS creates memory segments for files
and then performs data I/O on the segment. This allows
DFS to support mapped files, and adds a layer of
fast "memory" caching above the DFS client cache.
When DFS data is is not in VM memory, then the VMM must
fault to a DFS page fault handler which either gets the
data from the DFS client cache, or retrieves it from a
DFS server. One interesting statistic to examine is
the ratio of rd_faults to vn_reads which for some
environments can give an indication of how a data working
set fits into system memory (RAM). If the ratio of
rd_faults to vn_reads is high, it may indicate that
adding system RAM could improve performance. The
usage of mapped files must also be taken into account.
wr_faults
The number of write faults issued by the VMM. This is
the VMM calling DFS to give it pages with dirty data so
DFS can store it in the DFS cache and possibly to the DFS
server as well. wr_faults are driven by vn_writes, mapped
file stores, and VMM page replacement. If the number of
wr_faults is significantly greater than the number of
vn_writes, this may indicate a high amount of page replacement
activity due to thrashing. Increasing system memory
may result in better performance.
pr_faults
The number of protection faults issued by the VMM.
cachehits
The number of rd_faults that were serviced by data from
the DFS client cache.
inflight
The number of rd_faults were the requested data has already been
requested and is currently arriving from a DFS server. Inflight
data can result from sequential page faults on the same DFS
"chunk" which is usually several pages in size, or read ahead
which can be triggered by the VMM or the DFS client based
on file access patterns.
rd_waits
The number or wait loops a rd_fault takes before the requested
inflight data has arrived. As data is steaming in from a DFS
server, the page fault path will be notified periodically
to see if enough data has arrived to satisfy the fault.
lookups
The number of file lookup RPC calls made to a DFS file
server. This should be compared to the vn_lkups stat
to get an idea of how many lookups are resolved in the
DFS client's name lookup cache. In some environments
increasing the name lookup cache with the dfsd -namecachesize
option can reduce lookup RPCs. The -stat option should
be increased equally with the -namecachesize option.
For example: dfsd -namecachesize 2000 -stat 2000.
On AIX the default values are based in the amount of
system memory with typical values being around 400 for
a 32 MB system. For single user workstations the
defaults are usually sufficient for a modest name cache
hit ratio. Multiuser systems may benefit from increased
name caches and status caches.
fstatus
The number of fetch status RPC calls made to a DFS file
server. This call is used to get file attributes. Most
DFS RPC calls return file attributes with the result. fetchstatus
RPC calls may be made when permissions need to be
calculated for a new user accessing a file whose name may already
be cached. Increasing the status cache may reduce fetchstatus
RPCs.
fdata
The number of fetch data RPC calls made to a DFS file server.
fetch data RPC calls are required when requested data is not
in the DFS client cache. When data is not in the VMM or the
DFS client cache, then a fetch data RPC must be made to
retrieve the data from the DFS file server.
readdir
The number of readdir RPC calls made to a DFS file server.
Compare this against vn_rdir.
gettokens
The number of gettoken RPC calls made to a DFS file server.
Tokens are the internal mechanism DFS uses to maintain
cache coherency between clients and servers. Most DFS RPC
calls return token rights. gettoken RPC calls are usually
required when there have been data collisions or directory
content changes which required revocation of tokens from
a client or when a client needs to renew a token that is
about to expire.
status
The number of storestatus RPC calls made to a DFS file server.
storestatus RPCs are used to store file attributes to the
DFS file server.
sdata
The number of storedata RPC calls made to a DFS file server.
storedata RPCs are used to store file data to the DFS
file server.
reltokens
The number of internal DFS client released tokens. This
stat should be ignored.
revokes
The number of token revocation requests that the DFS client
has received from servers.
DFS SERVER:
-----------
lookup
The number of lookup RPC calls received by a DFS server.
lkuproot
The number of lookup root RPC calls received by a DFS server.
This RPC is made by DFS clients when they first cross a
DFS mount point and then periodically there after.
fstatus
The number of fetch status RPC calls received by a DFS server.
sstatus
The number of store status RPC calls received by a DFS server.
fdata
The number of fetch data RPC calls received by a DFS server.
sdata
The number of store data RPC calls received by a DFS server.
readdir
The number of read directory RPC calls received by a DFS server.
mkdir
The number of directory create RPC calls received by a DFS server.
rmdir
The number of directory remove RPC calls received by a DFS server.
create
The number of file create RPC calls received by a DFS server.
rmfile
The number of file remove RPC calls received by a DFS server.
rename
The number of rename RPC calls received by a DFS server.
link
The number of hard link RPC calls received by a DFS server.
symlink
The number of symbolic link RPC calls received by a DFS server.
fetchacl
The number of fetch ACL RPC calls received by a DFS server.
storeacl
The number of store ACL RPC calls received by a DFS server.
gettoken
The number of get token RPC calls received by a DFS server.
reletoken
The number of release token RPC calls received by a DFS server.
sctx
The number of set context RPC calls received by a DFS server.
DFS client's make set context RPC calls to setup up a
"DFS connection" to a file server. A connection represents
a DCE principle at a client. Connections may periodically
by renewed or re-activated when they become stale.
gettime
The number of get time RPC calls received by a DFS server.
DFS clients use get time calls as "keep alives" during
periods of idle activity to keep cache coherency state active
at DFS file servers. Normal RPC calls act as keep alives.
Idle client systems typically send a keep alive about every
90 seconds when there are "active" tokens at the client.
setparam
The number of setparameter RPC calls received by a DFS server.
bulkkalive
The number of bulk keep alive calls received by a DFS server.
Replication servers make this RPC to DFS file servers which
hold replicas.
bulkfetchVV
The number of bulk fetch version calls received by a DFS server.
Replication servers make this RPC to DFS file servers which
hold replicas.
totalcalls
The total number of RPC calls received by a DFS server.
The last two sections, which report statistics about FXD procs counters,
can be used to help detect situations
where a DFS server is having load problems. To understand them, you have to
know a little bit about the DCE RPC facility. DCE RPC servers have pools of
pre-created threads, dedicated to handling various types of incoming RPCs.
The DFS file server has two such thread pools, the so-called "tokenprocs"
pool and the "mainprocs" pool. The sizes of these pools can be controlled by
arguments to the fxd
command that starts the DFS server. Defaults
are 2 tokenprocs and 8 mainprocs, as above; they can be increased to a maximum
of 10 tokenprocs and 24 mainprocs. Each thread pool also has a queue of size
400 to handle overflow.
The n_threads, n_idle
, and n_queued
counters, as you
can see above, combine to give you an idea of how busy this server is and
how close it is to its maximum capacity. If you ever reach the server's full
capacity (i.e., if n_idle
is zero and n_queued
is
equal to the max. number of calls that could be queued), then subsequent
incoming RPC requests will be ignored until space frees up in the server's
queue. To clients, this will look like the server is down, so this can
cause apparent DFS outages from clients. Some customers run dfsstat on their
servers every few minutes via a background script as a means of monitoring
server load. If you detect an overloaded server, the first thing you can do
is to increase the fxd
mainprocs and tokenprocs parameters, by
modifying the fxd
line in
/opt/dcelocal/etc/cfgarg.dat
as follows, and then rebooting:
fxd: -mainprocs 24 -tokenprocs 10 -admingroup subsys/dce/dfs-admin
If that doesn't work, then your server is just plain maxxed out, and you'll
need to consider moving busy filesets to other DFS servers. This of course
assumes that the load is "legitimate", and not the result of some rogue
client(s) needlessly hammering the DFS server. If you see unexpectedly high
load on a DFS server, then you could consider using tracing tools (network
packet tracing or DFS tracing, or both) to see which clients are accessing
the server most frequently; then you could use DFS tracing on those clients
to see what they're doing. You could also look at the read/write counts as
shown in "fts lsft
" output, to determine which filesets are
getting the most activity, then try to track down the users of those
filesets. These are the general procedures for investigating loaded DFS
servers.