IBM Support

Safe method to restart the TM1 servers (to ensure that it does not break the Controller FAP publish)

Troubleshooting


Problem

Customer would like to reboot their FAP-related servers (Controller application server and TM1 server). However, if they restart servers/services in the wrong order, they notice that the FAP publish stops/fails.

=> Question - How can the customer reboot FAP-related servers, without breaking the FAP publish?

In other words, how can customers reboot servers without needing to perform a fresh/new Initial Publish (which can take hours for large data selections)?

Symptom

Example:

Imagine a scenario where:

  1. Customer stops both the TM1 instance (sometimes known as a 'TM1 server') and the Windows service "IBM Cognos FAP Service" (also known as the 'FAP Service')
  2. Customer now starts the TM1 instance, but the TM1 instance has a task to perform (for example reloading its saved cube data from disk) therefore it will be some time (for example 10 minutes) before it is ready
  3. Before waiting for the TM1 instance to be ready (in other words, the TM1 service is still busy reloading/processing a cube) the customer starts the Windows service 'IBM Cognos FAP Service'

 

The above actions will cause the FAP service not to start properly.

  • For example, there may be the following error (when the user launches the FAP Client):  "It seems that no FAPService is running or FAPDb is started/restarted after FAPService".
  • Alternatively, customers may find that the trickle updates no longer occur as expected (the FAP cube fails to be updated when the source Controller data changes).

Cause

When the Controller "FAP" service starts up, it will attempt to contact the TM1 server.

  • If this fails three times, then the publish status will be set to error.


When the TM1 service starts up, it will take a few minutes until the TM1 data is available.

  • During this time it will not be usable by the Controller FAP service.
  • This also means that it is not possible to register a dependency on the FAP service against the TM1 service (as the FAP service starts much quicker than the data in TM1 is online).


This means that care has to be taken when rebooting FAP / TM1 services, since they are interdependent.

Environment

Although it is possible to have a huge variety of configurations, it is fairly typical for customers to have two separate servers:

(1) TM1 (or PA/Planning analytics) server

(2) Controller application server.

  • By default, the FAP service is installed/running on this server
  • However, some customers may choose to deliberately install it on the TM1 server instead.

Resolving The Problem

When restarting TM1/Controller-related servers/services:

  • Make sure that the relevant steps are done in the correct order.
    • Most importantly, do not start the FAP service until the TM1 server is ready (finished restarting, and all TM1 restore/recovery processes completed).
  • Also, if you are using a recent version of Controller, use the parameter:    connection_polling_timeout

==========================================================

Prerequisites

(1) Configure the ‘IBM Cognos FAP Service’ service to have its startup type as:   Manual

(2) To make the Controller FAP service more resilient (less likely to error, if the TM1 server is temporarily unavailable) perform both of the following:

(a) Make sure you are using a modern version of Controller, which include the new 'connection_polling_timeout' feature:

  • Controller 10.3.0 FP1 IF6 (10.3.1.64) or later 10.3.0.x version.
  • Controller 10.3.1 IF1 (10.3.1100.159) or later.

(b) Add a parameter connection_polling_timeout into the file 'FAPService.properties'.

  • For more details, see separate IBM Technote #2012681.

==========================================================


Steps

~~~~~~~~~~~~~~~

[Stopping the server]

~~~~~~~~~~~~~~~

1. Log in to the Controller application server (or wherever you run the FAP client from)
2. Launch "IBM Cognos FAP" (the FAP client):

3. Log in (to 'FAP Connect'):


4. Click on "Data Marts" tab
5. Verify that the 'status' of the relevant Data Mart is set to 'running'
6. Close the FAP client.

7. Stop the Windows service ‘IBM Cognos FAP Service’:


8. Launch TM1 'Architect'

9. Login to the TM1 instance (or instances):


10. Right click the TM1 service name, then click the ‘Save Data’:


11. Wait until the save data is complete

12. Stop the relevant TM1 Windows service.

  • The name of this service will (typically) start with ‘IBM TM1 Server’, and then will also have the instance name afterwards, for example:   IBM Cognos TM1 Server - FAP

 

NOTE:

  • By stopping the TM1 service, it should automatically save data to disk (in other words, it should automatically perform the task that we previously manually performed in step 10). However, best practice is to manually save data (by performing step 10) as a 'belt and braces' approach.
  • After stopping the TM1 service, wait for an appropriate amount of time (this will vary - for example 1 to 5 minutes) to allow it time to save all the data to disk.
    • If you do not wait for this 'saving' process to complete (and instead simply reboot the TM1 host server machine, for example) then it can cause problems
    • In other words, if the TM1 Windows service is not stopped before restarting the server, then the 'save date' process may be terminated prematurely. This will cause a transaction log recovery to take place (when the TM1 service next starts - after the server has completed its reboot) which will add time to the TM1 instance being ready.


13. At this point, the server can now be rebooted (if desired).

  • Wanting/needing to reboot the server is the most common reason why customers are performing the entire process described in this Technote!

 

~~~~~~~~~~~~~~~
[Starting the services]

~~~~~~~~~~~~~~~
1. Logon to the application server.
2. Ideally, check to see that the relevant TM1 Windows service ("IBM Cognos TM1 Server - FAP") has started successfully

3. Wait an appropriate amount of time (perhaps 1-20 minutes) for the TM1 instance to rebuild itself in memory.

-----------------------------------------------------------

Explanation: When the TM1 service starts, the Windows service itself starts very quick. However, the TM1 data must then be loaded into memory - this process will take a few minutes (for a large FAP cube).

  • You must wait for this TM1 instance/cube-building process to finish before starting the Controller FAP service
  • If the Controller FAP service is started before the TM1 cube has finished rebuilding, then the FAP service may fail (because it will be disallowed to logon to the TM1 instance).

-----------------------------------------------------------

4. Check to see if the TM1 instance has finished (rebuilding the cube) by looking inside Task Manager.

  • Inside "Processes" locate the following: tm1sd.exe
  • Check the CPU % usage for this process.
    • TIP: This process is typically single-threaded.

-----------------------------------------------------------

Example: If the TM1 server is quad-core (4 CPU cores) then tm1sd.exe will show a usage of approximately 25% total CPU during the rebuild.

- It will fall to approximately 0% CPU usage when the rebuild has finished.

-----------------------------------------------------------

 

5. Start the Windows service ‘IBM Cognos FAP Service’.
6. Launch the FAP client, and login (in the 'FAP connect' screen)

 

7. Click on the 'Data Mart' tab, and verify that the Data Mart is still in ‘Running’ status
8. Click on the 'Logs' tab, and verify that there are no errors.
 

-----------------------------------------------------------

Explanation: If the Data Mart is automatically set to 'running', and there are no errors in the logs, then this means that the FAP publish has survived the server reboot process, and there is no need to run a fresh/new Initial Publish!

-----------------------------------------------------------

[{"Product":{"code":"SS9S6B","label":"IBM Cognos Controller"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Controller","Platform":[{"code":"PF033","label":"Windows"}],"Version":"8.5.1;10.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
21 November 2018

UID

swg21585881