How can I rebalance my HDFS data?

Technote (FAQ)


Question

The Rebalance command in BigInsights only balances the data across the Data Nodes. How can I rebalance the data across all the disk drives within the datanodes of my cluster?

Answer

To rebalance the data you can use the Replication Factor in Hadoop. By increasing the replication factor, it will create more copies of either a single file or all your data files. Then when you reduce the replication factor back to what is was (default is 3), the system will remove blocks of data, in a balanced manor on all disk drives across the data nodes of the cluster.

If you don’t have an even distribution of blocks across your Datanodes, you can increase replication temporarily and then bring it back down.

To set replication of an individual file to 4:

./bin/hadoop dfs -setrep -w 4 /path/to/file

You can also do this recursively. To change replication of entire HDFS to 4:

./bin/hadoop dfs -setrep -R -w 4 /

After it is complete, you would :

To set replication of an individual file to 3:

./bin/hadoop dfs -setrep -w 3 /path/to/file

You can also do this recursively. To change replication of entire HDFS to 3:

./bin/hadoop dfs -setrep -R -w 3 /

This should balance the disk usage to an even percentage across all disk drives (within reason)

Rate this page:

(0 users)Average rating

Document information


More support for:

InfoSphere BigInsights

Software version:

1.2.0, 1.3.0, 1.4.0, 2.0.0, 2.1.0

Operating system(s):

Linux Red Hat - pSeries, Linux Red Hat - xSeries, Linux SUSE - pSeries, Linux SUSE - xSeries

Software edition:

Basic Edition, Enterprise Edition

Reference #:

1578484

Modified date:

2014-12-11

Translate my page

Machine Translation

Content navigation