The Rebalance command in BigInsights only balances the data across the Data Nodes. How can I rebalance the data across all the disk drives within the datanodes of my cluster?
To rebalance the data you can use the Replication Factor in Hadoop. By increasing the replication factor, it will create more copies of either a single file or all your data files. Then when you reduce the replication factor back to what is was (default is 3), the system will remove blocks of data, in a balanced manor on all disk drives across the data nodes of the cluster.
If you don’t have an even distribution of blocks across your Datanodes, you can increase replication temporarily and then bring it back down.
To set replication of an individual file to 4:
./bin/hadoop dfs -setrep -w 4 /path/to/file
You can also do this recursively. To change replication of entire HDFS to 4:
./bin/hadoop dfs -setrep -R -w 4 /
After it is complete, you would :
To set replication of an individual file to 3:
./bin/hadoop dfs -setrep -w 3 /path/to/file
You can also do this recursively. To change replication of entire HDFS to 3:
./bin/hadoop dfs -setrep -R -w 3 /
This should balance the disk usage to an even percentage across all disk drives (within reason)