Troubleshooting full filesystems
This technote addresses some reasons a filesystem can become full, and presents ways to find out what may be filling it up in order to free the space.
Please read the CAUTION at the bottom of this technote before removing any files.
Reasons a filesystem can be full
* Used cp or tar on sparse files
If a sparse file, such as a database table, is copied into the filesystem using 'cp' or 'tar' those utilities will not preserve the sparseness. They will fill any null space in the file with zeroes, which may make it much larger.
Use the 'fileplace' command on the source file to check if it is sparse:
# fileplace myfile
File: sparse Size: 51226 bytes Vol: /dev/hd1
Blk Size: 4096 Frag Size: 4096 Nfrags: 1
00006806 1 frags 4096 Bytes, 100.0%
unallocated 12 frags 49152 Bytes 0.0%
The "unallocated" file fragments are filesystem data blocks that are associated with this file, but contain no data. When 'cp' or 'tar' copies this file it will no longer be sparse. Other utilities such as "pax" and "restore" will preserve sparseness in this type of file.
For more information see Technote T1000145 - About Sparse Files
* Large log files or data files created by an application
Use "size" option in the find command to look for large files. If you specify "+number" then it will report on all files greater than that number. The "size" argument to find is in 512-byte blocks:
# find /mtpt -xdev -size +2048 -ls
The above example command will:
- find all filesystems in the mount point /mtpt
- not search other filesystems that may be lower in the filesystem tree (-xdev)
- report on files with size greater than 2048 512-byte blocks (-size +2048)
2048*512 = 1048576 bytes, or 1 MB
- list the files found using output similar to 'ls -l' (-ls)
If the low size lists too many files, use larger increments:
2048 = 1 MB
20480 = 10 MB
204800 = 100 MB
* A deleted file is still in use by a process
If a file is open by a process, library or kernel module and is deleted only the name will be removed from the directory. The inode will still exist, and the filesystem blocks will still be allocated to it. Only when the inode is closed will the data blocks be freed.
The file will still be counted as "used" in the output from df, but will not be visible via 'ls' or 'du'. This is also discussed in Technote T1000401 - Why Numbers from "du -s" and "df" Disagree
These deleted but open files can be shown by using the "-d" option to the fuser command, as in this example:
# fuser -dV /
inode=16 size= 126361 fd=3 7012596
The last number is the PID of the process that has the file open.
Use ps to find the process:
UID PID PPID C STIME TTY TIME CMD
root 6422704 1 0 11:16:09 - 0:00 auditbin
Techniques to Find Files
* Finding files by date
If the filesystem has recently filled up, use the -newer flag to find recently modified files. To produce a file for the -newer flag to find against, use the following touch command:
$ touch <mmddhhmm filename>
From left to right, the following date fields correspond to:
- mm is month
- dd is day
- hh is hour (24 hour format)
- mm is minute
Then execute the following command to find files modified more recently than the date on the file you created:
# find /mtpt -xdev -newer <touched_file> -ls
Another useful flag for the find command will allow files to be located that have been changed in the last 24 hours.
# find /<filesystem_name> -xdev -mtime 0 -ls
* Use du to add up the file and directory sizes in the filesystem:
# du -xk This will give you total sizes in KB for files
and directories. -x stays within the same filesystem.
If using KB is too large, "m" and "g" can be used for Megabytes and Gigabytes:
# du -xm Shows output in MB
# du -xg Shows output in GB
Using du in conjunction with the sort command can allow you to identify the biggest directories:
# du -xk | sort -n
See Technote T1000401 - Why Numbers from "du -s" and "df" Disagree for further discussion on this issue.
* Application or command core dumps
Core dumps can be very large. Use the find command to search for them:
# find /mtpt -xdev -name 'core*' -ls
* Filesystem mounted on non-empty directory
Sometimes an application or subsystem will be started before the filesystem required to hold the output from it is mounted. For example if auditing is configured to write trail files to the /audit filesystem, but /audit is not mounted when auditing is started, it will write them to the /audit directory in the root filesystem. Later mounting the filesystem /audit will obscure the trail files that are being used and may eventually fill up the root filesystem.
The best solution is to unmount filesystems by hand and check in their mount directories. If this cannot be done in multiuser due to applications running, it would be advantageous to boot into single-user where only /, /usr, /var and /tmp are mounted. Technote T1011796 - Booting AIX in Single-User Mode can be used as a guide.
Specific AIX filesystems
If root (/) is full
* Check for filesystems mounted over directories containing data
Files or data may have been copied into a directory instead of a mount point, then later when the filesystem is mounted it will obscure the files. In this case when du is run on the filesystem it will show a very low number, but df will report the real available space in the filesystem. For example:
We create a filesystem for oracle data:
# crfs -v jfs2 -g oravg -m /oracle -a size=200M
but forget to mount it before copying data into it:
# cp /lots_of_data /oracle
Then mount it afterwards:
# mount /oracle
To fix this, you must unmount /oracle and remove the files in the /oracle directory. This can affect any filesystem mounted over another, but most often affects the root filesystem.
* Check the /etc/security/failedlogin file
Use the following command to read the contents of the file.
# who /etc/security/failedlogin
The condition of TTYs respawning too rapidly will create failed login entries. To clear the file after reading or saving the output, execute the following command:
# cp /dev/null /etc/security/failedlogin
* Check the /dev directory
If a device name is typed incorrectly, as in rmto instead of rmt0, a file will be created in /dev called rmto. The command will normally proceed until the entire root file system is filled before failing. /dev is part of the root (/) file system. Look for entries that are not devices (that do not have a major or minor number).
Execute the following:
# cd /dev
# ls -l | more
Whereas a file size on an ordinary file would normally be seen, a device file will have two numbers separated by a comma.
crw-rw-rw- 1 root system 12, 0 Oct 25 10:19 rmt0
If the output looks like the following, the file should be removed.
crw-rw-rw- 1 root system 9375473 Oct 25 10:19 rmto
NOTE: The /dev directory has some valid file names. Look for a file that has a large size (larger than 500 bytes).
* If system auditing is running, the /audit directory (default) may rapidly fill up and require attention.
* Check for very large files in / with the find commands above.
If /var is full
* In /var/tmp, check for old leftover files.
* Check for a large wtmp file
/var/adm/wtmp is a file that is used to log all logins, rlogins and telnet sessions. If it is not monitored it will grow indefinitely unless system accounting is running. System accounting will clear it out nightly. /var/adm/wtmp can either be cleared out or edited to remove old and unwanted information.
To clear /var/adm/wtmp, execute the following:
# cp /dev/null /var/adm/wtmp
To edit the file and remove unwanted entries, execute the following:
# /usr/sbin/acct/fwtmp < /var/adm/wtmp >/tmp/out
Edit the /tmp/out file to remove unwanted entries then put the edited version back in wtmp by executing the following command:
# /usr/sbin/acct/fwtmp -ic < /tmp/out > /var/adm/wtmp
* In the /var/adm/ras directory, clear the error log
This directory contains the error log, errlog. It is never cleared unless it is manually cleared. DO NOT cp /dev/null to it or it will disable the error logging functions of the system. In that case a zero (0) length errlog file must be replaced from a backup tape.
First, stop the error daemon by entering:
Second, remove or move to a different filesystem the following file:
NOTE: The historical error data is deleted if you remove the errlog file.
Third, restart the error daemon by entering:
* Check for any trace files
There may be a trace file in /var/adm/ras The trcfile file in this directory may be large due to a previous trace being run. The file can be removed by executing the following:
# rm /var/adm/ras/trcfile
* Check for vmcore files
You may also have vmcore* files in the /var/adm/ras directory if your dump device is set to hd6 (which is the default). If these files are old and/or you do not wish to persue them, you may remove them.
* Check for spool files
The /var/spool directory contains the queueing subsystem files. Clear the queueing subsystem by executing the following commands:
- # stopsrc -s qdaemon
- # rm /var/spool/lpd/qdir/*
- # rm /var/spool/lpd/stat/*
- # rm /var/spool/qdaemon/*
- # startsrc -s qdaemon
* Check for accounting files
The /var/adm/acct directory contains accounting records. If accounting is running, this directory may contain several large files. Information on how to manage these files can be found in System Management Guide Chapter 14 (SC23-2457-01).
* Terminated vi session files
The /var/preserve directory contains terminated vi sessions. Delete these.
While old vi sessions can be used to recover files that were abnormally terminated, these files can be deleted. However, the user may want to keep some of the newer ones in case users want to recover files. To recover a file, execute the following:
$ vi -r <filename> or -r
This will list all available files that are recoverable.
* Large /var/adm/sulog
This file tracks the number of attempted uses of su and whether they are successful or not. This is a flat file and can be viewed and modified with a favorite editor. If it is removed it will be recreated by the next attempted su.
* Large /var/tmp/snmpd.log
This is used by the snmpd daemon as a log. If the file is removed it will be recreated by the snmpd daemon.
The size of this file can be limited so that it does not grow indefinitely by editing the /etc/snmpd.conf file under the section for size. This is in bytes.
* Check for large mailboxes
Files in /var/spool/mail/ are flat text files that serve as the user's mailbox. You can just move them out of the way or zero them out, if you are sure that the mails are not needed by the user.
Use the skulker utility
AIX provides a general system cleanup script called skulker located in the /usr/sbin directory. Before attempting to run the skulker command, look at the skulker entry in the product documentation. Read the script for details to determine what files it will delete and what time frame it will allow files to exist before deletion.
skulker may be run as a cron job using the following crontab entry:
0 3 * * * /usr/sbin/skulker
Consider limiting the errlog by the running these entries in cron:
0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90
CAUTION: Before removing any files, the user should check to see if the file is currently in use by an active user process. Execute the following command to view the process ID of a process with this file open:
# fuser -f <filename>
More support for:
Software version: Version Independent
Operating system(s): AIX
Reference #: T1011082
Modified date: 01 September 2010