IBM Support

Removing and Replacing a Fixed Disk

Question & Answer


Question

Removing and Replacing a Fixed Disk

Answer


This document describes the procedures to remove and replace a fixed disk in a volume group. These procedures DO NOT apply in the following environments:

  1. The disk is in a shared volume group. This would apply to environments that use HACMP, RVSD, or any other management software. Refer to the documentation for that product for correct disk replacement procedures. The SSA User's Guide explains the procedures for changing disks in a RAID or hot swap environment. It is available for download at:
  2. http://www.ibm.com/support/docview.wss?rs=505&uid=ssg1S10023 48

  3. The disk is in rootvg and the disk contains any one of the following logical volumes, which are not mirrored:
  4.      hd2, hd3, hd4, hd6, hd9var, hd8
    

    In this case, you would need to replace the disk and restore from a system backup specifying the correct disks to restore.

  5. The system is a /usr, dataless, or diskless client.

This document applies to AIX Versions 4.3.3 and 5L.

Please read the entire document before proceeding and ensure all relevant fixes mentioned in this document or otherwise are installed prior to using these procedures.

Removing a physical volume from a volume group
How to proceed if the volume group has just one disk
Checking to see what quorum is set to
Deallocating physical partitions from the disk
Deleting the disk from the volume group
Removing the disk definition from the system
Adding a new drive to an existing volume group
Recommended fixes
Related documentation

Removing a physical volume from a volume group

The basic steps to replacing a disk drive are as follows:

  1. Deallocate all the physical partitions associated with the physical volume in the associated volume group.
  2. Remove the physical volume from the volume group.
  3. Remove the definition for the disk from the device configuration database.

These steps are outlined in more detail in subsequent sections.

If there is just one disk in the volume group, proceed to the next section, "How to proceed if the volume group has just one disk." Otherwise, proceed to the section entitled "Deallocating physical partitions from the disk."


How to proceed if the volume group has just one disk

If the drive to be replaced is the only drive in the volume group, then remove the volume group definition with:

     exportvg <VGname>

At this point, remove the disk definition using the rmdev command. Details are included in the section "Removing the disk definition from the system" in this document.


Checking to see what quorum is set to

If your volume group is mirrored, you could have disabled quorum. The purpose of quorum is explained in the IBM Redbook: LVM: Introduction and Concepts. Here is an excerpt:

A quorum is a vote of the number of Volume Group Descriptor Areas and Volume Group Status Areas (VGDA/VGSA) that are active. A quorum ensures data integrity in the event of a disk failure. Each physical disk in a volume group has at least one VGDA/VGSA. When a volume group is created onto a single disk, it initially has two VGDA/VGSA areas residing on the disk. If a volume group consists of two disks, one disk still has two VGDA/VGSA areas, but the other disk has one VGDA/VGSA. When the volume group is made up of three or more disks, then each disk is allocated one VGDA/VGSA.

A quorum is lost when enough disks and their VGDA/VGSA areas are unreachable so that a 51% majority of VGDA/VGSA areas no longer exists. In a two-disk volume group, if the disk with only one VGDA/VGSA is lost, a quorum still exists because two of the three VGDA/VGSA areas still are reachable. If the disk with two VGDA/VGSA areas is lost, this statement is no longer true. The more disks that make up a volume group, the lower the chances of quorum being lost when one disk fails.

lsvg testvg --> check to see if Quorum is set to 1

     VOLUME GROUP:   testvg                 VG IDENTIFIER:  
00097c7f00004c00000000f4a163bc7b
     VG STATE:       active                 PP SIZE:        16 megabyte(s)
     VG PERMISSION:  read/write             TOTAL PPs:      1084 (17344 
megabytes)
     MAX LVs:        256                    FREE PPs:       998 (15968 
megabytes)
     LVs:            12                     USED PPs:       86 (1376 megabytes)
     OPEN LVs:       9                      QUORUM:         1
     TOTAL PVs:      2                      VG DESCRIPTORS: 3
     STALE PVs:      1                      STALE PPs:      1
     ACTIVE PVs:     2                      AUTO ON:        yes
     MAX PPs per PV: 2032                   MAX PVs:        16

If replacing the disk drive immediately, then you may not have to turn quorum back on. The mirrorvg command automatically turns quorum off and the unmirrorvg command automatically turns quorum on. If you are dealing with rootvg, the only way to have a change in quorum take effect, is to reboot. If this is a non-rootvg volume group, quorum can be changed by varyingoff the volume group and then varying it back on. To change quorum from the command line, run the following:

chvg -Q n vgname --> turns quorum off
chvg -Q y vgname --> turns quorum on

In order to varyoff a volume group, unmount all filesystems and make sure all logical volumes are in a closed state. Enter:

     lsvg -l testvg
     LV NAME     TYPE      LPs   PPs   PVs   LV STATE          MOUNT POINT
     testlv      jfs2      1     1     1     closed/sync'd     /testfs
     loglv00     jfs2log   1     1     1     closed/sync'd     N/A

Once all the logical volumes are closed, run the following:

     varyoffvg vgname
     varyonvg vgname

The new quorum change will be in effect.


Deallocating physical partitions from the disk

Every physical partition (PP) on the disk allocated to any logical volume (LV), including file systems or paging spaces, must be deallocated, either by moving the contents of those PPs to another disk or by removing them.

To determine what logical volumes have PPs allocated to that disk, run:

     lspv -l <hdisk#>

If the hdisk name no longer exists, and the disk is identifiable only by its 16-digit PVID (you might see this from the output of lsvg -p <VGname>), substitute the PVID for the disk name. For example:

     lspv -l 0123456789abcdef

You may receive the following error:

     0516-320 : Physical volume 00001165a97b10c6 is not assigned to 
     a volume group.

If so, run the following command:

     putlvodm -p `getlvodm -v <VGname>` <PVID>

VGname refers to your volume group, PVID refers to the 16-digit physical volume identifier, and the characters around the getlvodm command are grave marks, the backward single quote mark. The lspv -l <PVID> command should now run successfully.

If another disk in the volume group has space to contain the partitions on this disk, and the disk to be replaced has not failed, the migratepv command may be used to move the used PPs on this disk. See the man page for the migratepv command on the steps to do this.

If the partitions cannot be migrated, they must be removed. The output of the lspv -l <hdisk#>, or lspv -l <PVID>, command indicates what logical volumes will be affected. Run the following command on each LV:

     lslv <LVname>

The COPIES field shows if the LV is mirrored. If so, remove the failed copy with:

     rmlvcopy <LVname> 1 <hdisk#>

hdisk# refers to all the disks in the copy that contain the failed disk. A list of drives can be specified with a space between each. Use the lslv -m <LVname> command to see what other disks may need to be listed in the rmlvcopy command. If the disk PVID was previously used with the lspv command, specify that PVID in the list of disks given to the rmlvcopy command. At AIX Version 4.2.1 or higher, the unmirrorvg command may be used in lieu of the rmlvcopy command. See the man pages for rmlvcopy and unmirrorvg, or other documentation, for additional information.

If the LV is not mirrored, the entire logical volume must be removed, even if just one physical partition resides on the drive to be replaced and cannot be migrated to another disk. If the unmirrored LV is a JFS file system, unmount the file system and remove it. Enter:

   umount /<fsname>
rmfs /<fsname>

If the unmirrored logical volume is a paging space, see if it is active. Enter:

     lsps -a

If it is active, set it to be inactive on the next reboot. Enter:

     chps -a n <LVname>

After you reboot, remove it by entering:

     rmps <LVname>

Remove any other unmirrored logical volume with the following command:

     rmlv <LVname>

NOTE: If the LV is serving as a dump device, the dump pointer must first be reassigned. The same is true if the LV was mirrored and the copy is being removed. Check the dump pointers by entering:

     sysdumpdev -l

Reassign the dump pointers. Enter:

sysdumpdev -Pp /dev/sysdumpnull   (for the primary device)
sysdumpdev -Ps /dev/sysdumpnull   (for the secondary device)

The pointers can be reassigned to the appropriate logical volume after it is recreated.


Deleting the disk from the volume group

Using either the PVID or the hdisk name, depending on which was used when running lspv -l <hdisk#> in the preceding discussion, run one of the following:

   reducevg -f <VGname> <hdisk#>
reducevg -f <VGname> <PVID>

If you used the PVID value and if the reducevg command complains that the PVID is not in the device configuration database, run the following command to see if the disk was indeed successfully removed:

     lsvg -p <VGname>

If the PVID or disk is not listed at this point, then ignore the errors from the reducevg command.


Removing the disk definition from the system

Remove the hdisk. Enter:

     rmdev -dl <hdisk#>

If the disk was an SSA disk, delete the pdisk. Enter:

     rmdev -dl <pdisk#>

If the disk was an SSA disk, determine which pdisk number corresponds to the hdisk. The easiest way to do this is by using the ssaxlate commands. Enter the following:

     ssaxlate -l hdisk# --> shows the pdisk#(s) definition
     ssaxlate -l pdisk# --> shows the hdisk# definition

Another way to do this is with the following commands:

     lsdev -Cc disk -F name' 'connwhere
     lsdev -Cc pdisk -F name' 'connwhere

See which SSA disk serial number coincides with the hdisk to remove. If the hdisk does not appear, or if the user has been working with a PVID value up to this point, the pdisk whose serial number does not coincide with any of the hdisks is likely to be the disk to remove. Other SSA commands might provide additional information. Consult the SSA documentation.

If you have been working with a PVID value rather than with an hdisk name, ensure that the PVID is removed from the ODM with the following command. The 32-digit value supplied consists of the PVID plus 16 zeros. For example:

   odmdelete -q value=0073659c2c6d26f10000000000000000 -o CuAt 

To physically remove the hard disk, consult the documentation for that device, or the hardware service organization for the vendor.


Adding a new drive to an existing volume group

Once the new drive has been configured, ensure that a proper PVID has been written to the drive by running:

chdev -l <hdisk#> -a pv=clear 
chdev -l <hdisk#> -a pv=yes

NOTE: On SSA drives, the first chdev command may be omitted.

Add the drive to the volume group with:

   extendvg VGname hdisk# 

You can also use the mkvg command to create a new volume group on the new drive.

New logical volumes, paging spaces, file systems, or logical volume copies can be re-added with the mklv, mkps, crfs, mklvcopy, or mirrorvg commands, respectively, or by using SMIT.


Recommended fixes

NOTE: In AIX Versions 4 and 5, use the command instfix -ik <APAR> to determine if a particular fix is installed.

There are numerous fixes corresponding to mirroring, pvids, and synching of volume groups in AIX. It is recommended that the customer be at the latest maintenance level (ML) for their different environments. The latest fixes/ML's can be downloaded from "Quick links to AIX fixes" (http://www.ibm.com/servers/eserver/support/unixservers/aixfixes.html).

In addition, the filesets that correspond to mirroring and synchronizations of disk drives are bos.rte.lvm and bos.rte.filesystem.

Call your IBM technical support specialist if you have specific questions concerning what fixes apply to your specific situation.


Related documentation

For more in-depth coverage of this subject, the following IBM publications are recommended:

  • AIX Version 4.3 System Management Guide: Operating System and Devices Chapter 5, "Recovering from Disk Drive Problems"
  • AIX Version 4.3 Commands Reference
  • AIX Version 5L Commands Reference

In addition, similar documents can be accessed through the following URL:
http://publib16.boulder.ibm.com/pseries/

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Attached devices","Platform":[{"code":"PF002","label":"AIX"}],"Version":"5.3;5.2;5.1;4.3","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Installation- backup- restore","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Historical Number

isg1pTechnote0736

Document Information

Modified date:
17 June 2018

UID

isg3T1000426