I came across an issue for a client where few of their users would login to Jabber (ver 10.5) and see their presence status as ‘offline’. They won’t be able to change it as well. Other Jabber users will also see them as ‘offline’ even though they are logged in and should be online. That was one issue. Few other users would give ‘false’ presence status like if they are ‘away’ they will be shown online and vice versa. They were on 10.5(2) IM&P which is a fairly new release so I was so sure it cannot be due to a bug (I have seen in past these kind of issues in older version of CUPS due to bugs).
I checked the configuration of users in CUCM and IM&P which was all good. Both nodes were in balanced mode and that looking ok as well. I checked the services and they were up and running all fine. I did a dbreplication runtimestate and dbreplication status and found as follows:
admin:utils dbreplication runtimestate
Server Time: Thu Aug 6 01:37:15 MST 2015
Cluster Replication State: Replication status command started at: 2015-08-06-01-27
Replication status command COMPLETED 305 tables checked out of 305
Last Completed Table: batfileinfojobmap
Errors or Mismatches Were Found!!!
Use ‘file view activelog cm/trace/dbl/sdi/ReplicationStatus.2015_08_06_01_27_31.out’ to see the details
DB Version: ccm10_5_2_21900_4
Repltimeout set to: 300s
PROCESS option set to: 1
Cluster Detailed View from lxus3tp04 (2 Servers):
PING DB/RPC/ REPL. Replication REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) DbMon? QUEUE Group ID (RTMT) & Details
———– ———- —— ——- —– ———– ——————
lxus3tp04 192.168.10.4 0.022 Y/Y/Y 0 (g_5) (2) Setup Completed
lxus3tp05 192.168.10.5 0.248 Y/Y/Y 0 (g_6) (2) Setup Completed
.
.
.
Errors found from the ‘utils dbreplication status’ command.
Please review the below logs for details.
options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 – 7 of 7) :
Thu Aug 6 01:26:43 2015 main() DEBUG: –>
Thu Aug 6 01:26:48 2015 main() DEBUG: Replication cluster summary:
SERVER ID STATE STATUS QUEUE CONNECTION CHANGED
———————————————————————–
g_5_ccm10_5_2_21900_4 5 Active Local 0
g_6_ccm10_5_2_21900_4 6 Active Connected 0 Jul 20 06:32:24
Thu Aug 6 01:26:56 2015 main() DEBUG: <–
———- Suspect Replication Summary ———-
For table: ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid
replication is suspect for node(s):
g_6_ccm10_5_2_21900_4
————————————————-
I noticed replication is showing errors for peuritoiuid table. I was a but curious so I jumped on the bug toolkit and searched for all bugs related to Presence server for Jabber offline status. After a bit of digging I came across a new bug for this kind of behavior where Jabber users will show offline status (on random) and the dbreplication will show a ‘bad’ peuritoiuid table. It is documented under Bug id: CSCuu65533.
To fix this I repaired the peuritoiuid table from IM&P at first node and after that replication was good and the issue related to ‘offline’ and ‘false’ presence status was resolved.
.
admin:utils dbreplication repairtable peuritoiuid
Repairing of replication for table peuritoiuid is in progress.
repairing replicate
replicatename: ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid
Output is in file cm/trace/dbl/sdi/ReplicationTblRepair.2015_08_06_01_54_02.out
Please use “file view activelog cm/trace/dbl/sdi/ReplicationTblRepair.2015_08_06_01_54_02.out” command to see the output
admin:file view activelog cm/trace/dbl/sdi/ReplicationTblRepair.2015_08_06_01_54_02.out
utils dbreplication repairtable tablename [nodename]|all output
To determine if replication is suspect, look for the following:
(1) Number of rows in a table do not match on all nodes.
(2) Non-zero values occur in any of the other output columns for a table
First, a cdr list of the replication status of the servers
SERVER ID STATE STATUS QUEUE CONNECTION CHANGED
———————————————————————–
g_5_ccm10_5_2_21900_4 5 Active Local 0
g_6_ccm10_5_2_21900_4 6 Active Connected 0 Jul 20 06:32:24
Thu Aug 6 01:54:07 2015 dbllib.getReplServerName DEBUG: –>
Thu Aug 6 01:54:12 2015 dbllib.getReplServerName DEBUG: replservername: g_5_ccm10_5_2_21900_4
Thu Aug 6 01:54:12 2015 dbllib.getReplServerName DEBUG: <–
Aug 06 2015 01:54:13 —— Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid start ——–
options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 – 20 of 44) :
Node Rows Extra Missing Mismatch Processed
—————- ——— ——— ——— ——— ———
g_5_ccm10_5_2_21900_4 3270 0 0 0 905
g_6_ccm10_5_2_21900_4 3240 0 30 0 0
The repair operation completed. Validating the repaired rows …
Validation completed successfully.
Aug 06 2015 01:54:15 —— Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid end ———
Running a cdr check after repair to insure table is in sync
Aug 06 2015 01:54:16 —— Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid start ——–
Node Rows Extra Missing Mismatch Processed
—————- ——— ——— ——— ——— ———
g_5_ccm10_5_2_21900_4 3270 0 0 0 0
options: q=quit, n=next, p=prev, b=begin, e=end (lines 21 – 40 of 44) :
g_6_ccm10_5_2_21900_4 3270 0 0 0 0
Aug 06 2015 01:54:16 —— Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid end ———
end of the file reached
Image may be NSFW.
Clik here to view.
Clik here to view.
