Class HealthMonitor

  • All Implemented Interfaces:
    MessageListener, java.lang.Runnable, java.util.EventListener

    public class HealthMonitor
    extends java.lang.Object
    implements MessageListener, java.lang.Runnable
    HealthMonitor utilizes MasterNode to determine self designation. All nodes cache other node's states, and can act as a master node at any given point in time. The intention behind the designation is that no node other than the master node should determine collective state and communicate it to group members.

    TODO: Convert the InDoubt Peer Determination and Failure Verification into Callable FutureTask using java.util.concurrent

    • Constructor Detail

      • HealthMonitor

        public HealthMonitor​(ClusterManager manager,
                             long timeout,
                             int maxMissedBeats,
                             long verifyTimeout,
                             long failureDetectionTCPTimeout,
                             int failureDetectionTCPPort)
        Constructor for the HealthMonitor object
        Parameters:
        manager - the ClusterManager
        maxMissedBeats - Maximum retries before failure
        verifyTimeout - timeout in milliseconds that the health monitor waits before finalizing that the in doubt peer is dead.
        timeout - in milliseconds that the health monitor waits before retrying an indoubt peer's availability.
        failureDetectionTCPPort - the tcp port of failure Detection
        failureDetectionTCPTimeout - the timeout to detect the failure
    • Method Detail

      • getIndoubtDuration

        public long getIndoubtDuration()
        A member is considered INDOUBT when a heartbeat has not been received in this amout of time. (in milliseconds.)
        Returns:
        the duration
      • run

        public void run()
        Main processing method for the HealthMonitor object
        Specified by:
        run in interface java.lang.Runnable
      • getMemberState

        public java.lang.String getMemberState​(PeerID peerID,
                                               long threshold,
                                               long timeout)
        Parameters:
        peerID - is the peer id
        threshold - is a positive value if the user wants to look at the caller's local cache to get the state
        timeout - is a positive value if the user desires to make a network call directly to the member whose state it wants if both the above parameters are specified, then fisrt attempt is to get the state from the local cache. If it comes back as UNKNOWN, then another attempt is made via LWR multicast to get the state directly from the concerned member.
        Returns:
        state
      • getMemberStateFromHeartBeat

        public java.lang.String getMemberStateFromHeartBeat​(PeerID peerID,
                                                            long threshold)
      • getMemberStateViaLWR

        public java.lang.String getMemberStateViaLWR​(PeerID peerID,
                                                     long timeout)
      • addHealthEntryIfMissing

        public boolean addHealthEntryIfMissing​(SystemAdvertisement adv)
      • reportJoinedAndReadyState

        public void reportJoinedAndReadyState()
      • announceWatchdogObservedFailure

        public void announceWatchdogObservedFailure​(java.lang.String failedMemberToken)
      • setJoinedAndReadyReceived

        public void setJoinedAndReadyReceived()
      • getMsgSendStats

        public com.sun.enterprise.mgmt.HealthMonitor.MsgSendStats getMsgSendStats​(java.lang.String memberName)