How to diagnose Fortigate Cluster HA sync and checksum issues.

This is a detailed guide on how to diagnose Fortigate Cluster HA sync and checksum issues.

1.Check that the cluster is in sync

You will see in the output below that FGT2 is out-of-sync.

FW01-MASTER # get system ha status
Master selected using:
HA Health Status: OK
Model: FortiGate-501E
Mode: HA A-P

#### Lines omitted for brevity #### 

Configuration Status:
    FGT1XXXXXXXX (updated 2 seconds ago): in-sync
    FGT2XXXXXXXX (updated 2 seconds ago): out-of-sync

You can also check this in the GUI

The red X signifies that the Slave is out of sync with the Master.

2. Force the Slave to re-sync with the Master

execute ha synchronize start

You will need to run the following command to jump across to the Slave device ” execute ha manage ” followed by the <id> number.

execute ha synchronize
start/stop      start/stop HA sync between master and slave

3. Diagnose sys ha checksum recalculate

This is will attempt to recalculate the checksum between the Slave and the Master. This will be done form the Slave device.

diagnose sys ha checksum
recalculate    Re-calculate HA checksum.

4. Diagnose sys ha checksum cluster

Diagnose sys ha checksum cluster will calculate the checksum for both devices in the cluster.

Alternatively you can run a check sum on each device by running diagnose sys ha checksum.

The following are two examples of a checksum output. One that matches and one that doesn’t.

4.1 Checksum does not match

================== FG100D3G13xxxxxx =================
 is_manage_master()=0, is_root_master()=0

debugzone
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: cf 85 55 fe a7 e5 7c 6f a6 88 e5 a9 ea 26 e6 92
all: f4 62 b2 ce 81 9a c9 04 8f 67 07 ec a7 44 60 1f

checksum
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: cf 85 55 fe a7 e5 7c 6f a6 88 e5 a9 ea 26 e6 92
all: f4 62 b2 ce 81 9a c9 04 8f 67 07 ec a7 44 60 1f

================== FG100D3G12xxxxxx ==================
is_manage_master()=1, is_root_master()=1

debugzone
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: d8 f5 57 46 f0 b8 45 1e 00 be 45 92 a2 07 14 90
all: a7 8d cc c7 32 b5 81 a2 55 49 52 21 57 f9 3c 3b

checksum
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: d8 f5 57 46 f0 b8 45 1e 00 be 45 92 a2 07 14 90
all: a7 8d cc c7 32 b5 81 a2 55 49 52 21 57 f9 3c 3b

4.2 Checksum that matches

================== FG100D3G13xxxxxx ==================
 
is_manage_master()=0, is_root_master()=0
debugzone
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: cf 85 55 fe a7 e5 7c 6f a6 88 e5 a9 ea 26 e6 92
all: f4 62 b2 ce 81 9a c9 04 8f 67 07 ec a7 44 60 1f
 
checksum
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: cf 85 55 fe a7 e5 7c 6f a6 88 e5 a9 ea 26 e6 92
all: f4 62 b2 ce 81 9a c9 04 8f 67 07 ec a7 44 60 1f
 
================== FG100D3G12xxxxxx ==================
 
is_manage_master()=1, is_root_master()=1
debugzone
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: cf 85 55 fe a7 e5 7c 6f a6 88 e5 a9 ea 26 e6 92
all: f4 62 b2 ce 81 9a c9 04 8f 67 07 ec a7 44 60 1f
 
checksum
global: 89 f2 f0 0b e8 eb 0d ee f8 55 8b 47 27 7a 27 1e
root: cf 85 55 fe a7 e5 7c 6f a6 88 e5 a9 ea 26 e6 92
all: f4 62 b2 ce 81 9a c9 04 8f 67 07 ec a7 44 60 1f

5. Reboot the Slave

At this stage I would normally reboot the Slave. 9 times out of 10 this would bring the devices back into sync.

6. Diagnose sys ha checksum show global

6.1 Run this on both the Master and the Slave.

diagnose sys ha checksum show global
system.global: fdac3a7077f4205919e7cd3ee36d203e
system.accprofile: b6712ba9d705d1ccaf9ac8e2d014d35c
system.npu: 00000000000000000000000000000000
system.np6: 80585c5bff0110a4a5c60ada8729321f
system.vdom-link: 00000000000000000000000000000000
wireless-controller.inter-controller: 00000000000000000000000000000000
wireless-controller.global: 00000000000000000000000000000000

6.2 Save the output and compare differences

Save the output to two different files and then compare the differences using software like ExamDiff.

This will tell you which section of the configuration is out of sync. In this case it is the system admin.

Master 

system.admin: 7f6ac897f4f1fa20d4c49b61f0a3d663

Slave 

system.admin: 4b0914cc08515f48c08afc0fcaa110c6

7. Diagnose sys ha checksum show global <object-fullpath>

To check what the differences that are in the specific section of the configuration you can run diagnose sys ha checksum show global followed by the specific section or object-full path. The following are a few examples:

diagnose sys ha checksum show global system.admin
diagnose sys ha checksum show global system.interface
diagnose sys ha checksum show global system.snmp.community

The following is an example for the system.admin

diagnose sys ha checksum show global system.admin
admin: 2d93dfa5654d439c3e682ef7b4094111

You will need to be run this on both the Master and the Slave and then compare the differences

This will allow you to investigate the section of configuration causing the sync issue and checksum issue.

If you are new to the world of Linux, an avid Linux enthusiast or a student why not try our 0.99p per month Linux VPS.

Simply click on the screen shot below to find out more or navigate to https://piggybank.cloud

Thank you for reading and please feel free to leave any feedback.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s