Manatee rebuild may hang on spinner during zfs receive

Elizabeth -

Triton software on-premises operators performing a manatee rebuild on a deposed peer may encounter an issue where it appears that the zfs receive process has hung. This is a known issue being worked on via MANATEE-262.

Check Overall Health

Prior to taking any action, check the health status of the manatee cluster by logging into one of the other manatee peers, and running manatee-adm pg-status

 

Verify that all peers show as OK (similar to the example output shown below):

 

ROLE PEER PG REPL SENT FLUSH REPLAY LAG

primary 292ef78o ok sync 0/55DCA3F0 0/55DCA3F0 0/55DCA0D8 -

sync 4367h8ui ok async 0/55DCA3F0 0/55DCA3F0 0/55DCA0D8 -

async 8y78b901 ok - - - - 0m00s

 

In addition, check the health of the overall Triton stack by running sdc-healthcheck from the headnode (HN).

If both manatee-adm pg-status and sdc-healthcheck return normal (all show OK and online), then you can safely ctrl-C out of the seemingly stuck manatee-adm rebuild.

 

When to Engage Support

If manatee-adm pg-status returns any errors, please open a support ticket and provide Joyent Support with the following information:

  • Output of manatee-adm pg-status
  • Output of sdc-healthcheck
  • Output of svcs -xv from both the global on the HN as well as from within the manatee peers
  • Manatee versions (if possible): sdcadm insts manatee
  • Datacenter census (if possible to run): 
cd /var/tmp && curl -k https://us-east.manta.joyent.com/joyentsup/public/census.sh | bash
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.