IBM Support

Using IBM Big Replicate to implement an active / standby disaster recovery solution for IBM BigInsights Big SQL

Question & Answer


Question

How do I use IBM BigInsights Big SQL with IBM Big Replicate to provide an active/standby disaster recovery solution?

Answer


This document describes how Big SQL and IBM Big Replicate can be used together to provide an active/standby disaster recovery strategy for Big SQL users.

This strategy supports a multi-cluster configuration with one cluster acting as a typical active Big SQL cluster and one operating as a standby Big SQL cluster (with Big SQL service stopped until needed). In this way, IBM Big Replicate and Big SQL backup and restore work together to ensure all necessary data is replicated and available on each cluster. This setup provides a Big SQL disaster recovery solution that can be utilized in the event of critical failures of the Big SQL Active Cluster. 

Requirements and Prerequisites 

  • Big SQL  
    Big SQL must be installed on both clusters with a consistent configuration
    • Same bigsql_db_path 
    • Same bigsql_data_directories
    • Same number of Big SQL worker nodes
    • Consistent numbering for each Big SQL worker node in the db2nodes.cfg file
  • Big Replicate   
    Big Replicate should be installed and configured as outlined in the Big Replicate documentation.
  • Big SQL High Availability (HA) and Backup / Restore  
    This solution is compatible with Big SQL HA. Big SQL HA can be enabled, and remain enabled during BIGSQL backup operations (Big SQL Active Cluster), however, currently Big SQL HA must be disabled during a restore ( Big SQL Standby Cluster).  This does not impact the current Big SQL with Big Replicate active/standby solution. The Big SQL service on the Big SQL Standby Cluster will be stopped during normal operation and only started during an actual failover situation or when performing a restore. After an actual failover to the Big SQL Standby Cluster, HA can then be enabled if necessary.

Solution Architecture
This section will outline the core architecture of the Big SQL active/standby disaster recovery solution.
    Big Replicate Data Replication 
    IBM Big Replicate allows you to migrate Hadoop data across multiple environments, distributions, and hybrid deployments.  For more information on IBM Big Replicate refer to IBM Big Replicate. 

    Big SQL Database Replication (Backup and Restore) 
    The process for Big SQL data replication utilises a database backup and restore method. Here, backup and restore operations are used to keep the BIGSQL DB data safe and readily available on multiple clusters within our Big SQL active/standby disaster recovery solution.

    With IBM Big SQL and IBM Big Replicate installed and configured appropriately on both clusters, it is then necessary to schedule regular backups of the BIGSQL database on the Big SQL Active Cluster's primary Head Node and transfer the generated backup images to the Big SQL Standby Cluster's primary Head Node.  On the Big SQL Standby Cluster, these backup images are then used to either regularly restore the BIGSQL database, so that the Big SQL Standby Cluster is consistently up-to-date and ready for a failover, or to perform a single restore of the latest backup image only when necessary as part of a user defined failover procedure.

    The bigsql_bar.py utility is provided to assist with this process, automating all necessary backup and restore operations.   
    Note: An initial offline backup and restore of the offline image must be performed before proceeding with regular online backups of the BIGSQL database.

    For more background information on backup and restore refer to the existing documentation on BACKUP DATABASE command and RESTORE DATABASE command

Typical Big SQL Backup & Restore Usage Scenario (as part of Disaster Recovery Solution)
This section describes the typical usage scenario.
Big SQL Active Cluster  
  • Take the necessary initial offline backup  (using bigsql_bar.py) 
  • Perform subsequent online backups regularly (using bigsql_bar.py - possibly as a  cron job)   
  • Transfer the backup images to the Big SQL Standby Cluster (using bigsql_bar.py) 
  
Big SQL Standby Cluster 
 
  • Start Big SQL service
  • Restore initial offline backup  (using bigsql_bar.py) 
  • Perform subsequent restores of online backups (using bigsql_bar.py - possibly as a cron job)  
  • Execute HCAT_SYNC_OBJECTS (using bigsql_bar.py, or manually as necessary) 
  • Enable HA as part of failover scenario, if necessary
  • Stop Big SQL service OR start using cluster as Big SQL Active Cluster (failover)

Note: It is the user's responsibility to ensure that any jars containing custom UDFs or SerDes are synchronized between the Big SQL Active Cluster and the Big SQL Standby Cluster as the backup and restore solution does not manage these.

Big SQL Backup & Restore Utility (bigsql_bar.py)  
The bigsql_bar.py utility is provided to assist with the process of taking regular database backups and restoring on a second cluster.  The script works by automating the backup, image transfer and restore operations. 

Requirements and Prerequisites  
  • User 
    The bigsql_bar.py utility must be executed as the configured bigsql service user.  
  • Passwordless SSH 
    To implement a Big SQL active/standby disaster recovery solution, the generated backup images must be regularly transferred from the Big SQL Active Cluster to the Big SQL Standby Cluster.

    To automatically transfer backup images from the Big SQL Active Cluster to the Big SQL Standby Cluster when using the bigsql_bar.pyscript, passwordless ssh must be configured between the Big SQL Active Cluster's primary Head Node and the Big SQL Standby Cluster's primary Head Node.

    If passwordless ssh is not configured, it will be necessary to use your own preferred mechanism, outside of the bigsql_bar.py utility, to transfer backup images to the Big SQL Standby Cluster's primary Head Node.  
  • Where to Run the Big SQL Backup and Restore Utility
    The bigsql_bar.py utility must be executed on the Big SQL Head Node from any directory. In cases where Big SQL HA is enabled, the utility must be executed on the Big SQL primary Head Node.

Usage 
The bigsql_bar.py utility can be executed in one of three modes:   
  • Backup:    Take a backup of the BIGSQL DB  
  • Restore:    Restore BIGSQL DB from an existing backup image 
  • Transfer:   Transfer an existing backup image to a remote machine 
One of these options must be specified for the execution to continue. 

IMPORTANT NOTE: As the Big SQL Backup & Restore Utility was developed to be used as part of the IBM Big SQL with IBM Big Replicate Active /Standby Disaster Recovery Solution, the restore operation (online and offline) will start the Big SQL service before performing the restore and stop the Big SQL service after a successful restore has completed.

Options 
There are a number of options available when executing the bigsql_bar.py utility.  All options except '-m' (mode) are optional and default values will be used if not explicitly specified. 
    The script accepts the following options:     
    -m mode 
    Required - Valid values are: 'Backup', 'Restore' or 'Transfer' 

    -r remote machine 
    Optional - Machine to transfer backup image to (e.g.: Big SQL Active Cluster's Head Node)

    -d backup directory 
    Optional - Directory for backup images (Default: /var/ibm/bigsql/backups) 

    Note: Ensure there is sufficient space available in the backup directory to store the generated backup images.

    -t timestamp 
    Optional - Timestamp of Backup Image (Default: timestamp for latest available image)  
    This option is valid with restore or transfer modes only. 

    -c, --hcat 
    Optional - Run HCAT_SYNC_OBJECTS after a restore 

    -o, --offline 
    Optional - Force offline option 
    This option is valid with restore, backup or transfer modes. 

    -l, --local 
    Optional - Use local backup images for restore (ie. Images in '<backup directory>/local_online_backups') 

    -s, --skipcheck 
    Optional - Skips the interactive check during a restore.  
    Enables restore operations to be run as a cron-job by not looking for user verification when deactivating the BIGSQL database.  

    -h, --help 
    Optional - Display the help menu

    Default Values 
    Option  Default 
    -m mode   No Default (execution will be aborted) 
    -r remote machine  No Default  (image will not be transferred) 
    -d backup directory  '/var/ibm/bigsql/backups' 
    -t timestamp  The most recent backup available (online / offline) 
    -c, --hcat HCAT_SYNC_OBJECTS will not be executed 
    -o, --offline Online assumed
    -l, --local Remote backup images used
    -s, --skipcheckUser verification sought when deactivating the BIGSQL database
    -h, --help NA

Logging & Troubleshooting 
A complete log of the script execution is located at /var/ibm/bigsql/logs/bigsql_bar.log on the Big SQL primary Head Node (where the script is executed).  All debug, information and error messages are written to this file.   

Backup & Restore Directory Structure  
Backup images will be stored in and fetched from either the default backup-directory (/var/ibm/bigsql/backups) or a backup-directory specified using the '-d' option. This  backup directory is the parent directory for all backup images.  When executed, bigsql_bar.py will store offline and online backup images in sub-directories under this parent backup-directory, as shown below.  

This directory structure is mirrored on both the primary and the standby clusters.  

   <parent backup-directory>  ('/var/ibm/bigsql/backups') 
           <offline_backups>  
           <local_online_backups>  
           <remote_online_backups> 

Note: It is the user's responsibility to ensure that there is sufficient diskspace available in the backup-directory and to manage any necessary housekeeping of these directories.

Backup Directory for BIGSQL DB Logs  
Any existing database logs are backed-up before a restore is performed.  The directory used for this is: <backup directory>/logstream_backup/  (by default: /var/ibm/bigsql/backups/logstream_backup/)

bigsql_bar.py Utility Location 
Currently the bigsql_bar.py utility is attached to this tech note and can be downloaded directly.

bigsql_bar.py Utility and Timestamps  
When a BIGSQL backup is performed, the output will include something like the following:

Part  Result  
----  -------------------------------------------------------------- 
0000  DB20000I  The BACKUP DATABASE command completed successfully.  
  
Backup successful. The timestamp for this backup image is : 20161011093130 


This output includes the timestamp associated with the backup image. This is also part of the backup image file name. For Example:  BIGSQL.0.bigsql.DBPART000.20161011093130.001  
  • When using bigsql_bar.py restore mode, you can specify the timestamp of a particular backup image to restore.  
  • When using bigsql_bar.py transfer mode, you can specify the timestamp of a particular backup image to transfer.  

If a timestamp is not specified during a restore or transfer operation, we default to the most recent backup image available in the backup-directory (e.g.: '/var/ibm/bigsql/backups/...' by default). 
By default this will be an online backup image, however an offline image can be used by specifying the '-o, --offline' option.    


HCAT_SYNC_OBJECTS
The HCAT_SYNC_OBJECTS stored procedure imports the definition of the Hive objects into the local database catalogs. This action makes the objects available for use within queries. For more details on this see HCAT_SYNC_OBJECTS.

The Backup and Restore utility, bigsql_bar.py, provides the option to automatically run HCAT_SYNC_OBJECTS after a successful restore using the '-c, --hcat ' option.
While this is off by default it's important to note that if there have been any DDL changes since the most recent available backup, HCAT_SYNC_OBJECTS should to be executed to ensure a fully up-to-date cluster (necessary as part of a failover). This can be performed as part of bigsql_bar.py execution using the '-c, --hcat' option or by running HCAT_SYNC_OBJECTS manually after a restore has completed.

Special consideration must be given to HCAT_SYNC_OBJECTS if impersonation is enabled for Big SQL. The HCAT_SYNC_OBJECTS execution as part of the bigsql_bar.py script does not currently support Big SQL impersonation. For more details on the use of HCAT_SYNC_OBJECTS with impersonation, see Impersonation in Big SQL.

Log File Management
In order to use an active / standby disaster recovery solution for Big SQL, the database must be made recoverable and database logging must be updated to use archive logging. That is, the logarchmeth1 database configuration parameter is changed to logretain during the initial offline backup. As database activity occurs, log files may accumulate and build up in the directory pointed to by the log file path database configuration parameter (usually set to: /var/ibm/bigsql/database/bigdb/bigsql/NODE0000/SQL00001/LOGSTREAM0000/) and it may be necessary to occasionally remove the archived log files that are no longer needed.

In order to remove the archived log files not needed for Big SQL disaster recovery, you can do the following:

1. Discover the value of the "First active log file" database configuration parameter. Files with a number lower than this file can be removed. For example:


  db2 get db cfg for bigsql | grep -i "first active log file"
  First active log file = S0000245.LOG

This returns the name of the first active log file. S0000245.LOG in our example above

2. Issue the 'db2 prune logfile...' command as follows:

  db2 prune logfile prior to <first active log file name> 
 

For our Example:


   db2 prune logfile prior to S0000245.LOG
   DB20000I  The PRUNE command completed successfully.

Note: The prune logfile command will NOT delete any active files even if the file name passed to it is more recent than the correct file name.


Example invocation of the bigsql_bar.py Utility 
Backup  
  • To perform an offline backup using all defaults, invoke bigsql_bar.py in the following way: 
    • python bigsql_bar.py -m backup --offline  
      OR
      python bigsql_bar.py -m backup -o 
     
  • To perform an online backup using all defaults and without transferring the backup image to another cluster, invoke bigsql_bar.py in the following way: 
    • python bigsql_bar.py -m backup  
     
  • To perform an online backup, specifying a user-defined backup directory to use while using all other defaults, invoke bigsql_bar.py in the following way 

    • python bigsql_bar.py -m backup -d /home/bigsql/my_backups  
     
  • To perform an offline backup using all defaults and specifying a remote machine to transfer the backup image to (e.g.: primary Head Node of Big SQL Standby Cluster) , invoke bigsql_bar.py in the following way: 

    • python bigsql_bar.py  -m backup --offline -r <machine.domain.com>  
      OR
      python bigsql_bar.py  -m backup -o -r <machine.domain.com>  
     
     Transfer 
  • To transfer the latest available offline image to the remote machine specified,  invoke bigsql_bar.py in the following way: 
    • python bigsql_bar.py -m transfer -r <machine.domain.com>  --offline  OR  
      python bigsql_bar.py -m transfer -r <machine.domain.com>  -o    
     
  • To transfer the latest available online image to the remote machine specified,  invoke bigsql_bar.py in the following way: 
    • python bigsql_bar.py -m transfer -r <machine.domain.com>  
     
  • To transfer the online image associated with the timestamp specified to the remote machine specified,  invoke the utility in the following way: 
    • python bigsql_bar.py -m transfer -r machine.domain.com -t 20161011093130  
     
  • To transfer the  latest available online image in the specified backup directory (/home/bigsql/mybackups22 in this case) to the remote machine specified,  invoke the bigsql_bar.py utility in the following way: 

    • python bigsql_bar.py -m transfer -r machine.domain.com  --offline  -d /home/bigsql/mybackups22  
     
    Restore  
  • To perform a restore  of the latest available offline backup image using all defaults, invoke the bigsql_bar.py utility in the following way: 
    • python bigsql_bar.py -m restore --offline 
      OR
      python bigsql_bar.py -m restore -o 
     
  • To perform a restore  of the latest available online backup image using all defaults, invoke the bigsql_bar.py utility in the following way: 

    • python bigsql_bar.py -m restore 
     
  • To perform a restore  of the latest available online backup image using all defaults and skipping the user confirmation when deactivating the BIGSQL database, invoke the utility in the following way: 
    • python bigsql_bar.py -m restore --skipcheck  
      OR
      python bigsql_bar.py -m restore -s 
     
  • To perform a restore of the latest online backup image available in the local backup subdirectory ('/var/ibm/bigsql/backups/local_online_backups/' by default ) using all defaults, invoke the bigsql_bar.py utility in the following way: 
    • python bigsql_bar.py -m restore --local  
      OR
      python bigsql_bar.py -m restore -l 
     
  • To perform a restore of the latest available online backup image using all defaults and executing HCAT_SYNC_OBJECTS after, invoke bigsql_bar.py in the following way: 
    • python bigsql_bar.py -m restore --hcat  
      OR
      python bigsql_bar.py -m restore -c  
     
  • To perform a restore of the online backup image associated with a specified timestamp and from a specified parent backup-directory (/home/bigsql/my_backups in this case) and execute HCAT_SYNC_OBJECTS after the restore completes, invoke the bigsql_bar.py utility in the following way: 
    • python bigsql_bar.py -m restore -t 20161011093130 -d /home/bigsql/my_backups --hcat

    [{"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Big SQL","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.2.0","Edition":"Enterprise Edition","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSVPQ7","label":"IBM Big Replicate"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

    Document Information

    Modified date:
    18 July 2020

    UID

    swg21994020