Friday, August 10, 2012

Error while starting cluster: java.lang.RuntimeException: Failed to start Service "Cluster"

Can't start managed servers in a SOA cluster correctly, SOAInfra is in a failed state and see below errors in log file

Oracle Coherence GE 3.6.0.4  (thread=Cluster, member=n/a): Failure to join a cluster for 300 seconds; stopping cluster service.
Oracle Coherence GE 3.6.0.4  (thread=[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Error while starting cluster: java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
        at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:637)
        at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)

This maybe caused because of another cluster in same subnet having the same cluster name. To fix this error and start servers correctly go to admin console and click on Managed Server->Server Startup tab. Under arguments field set the below for each server of the cluster.

Let's say the 2 node names are host1.com, host2.com respectively.
On server startup on Node1 set

-Dtangosol.coherence.wka1=host1.com -Dtangosol.coherence.wka2=host2.com -Dtangosol.coherence.localhost=host1.com -Xmanagement:ssl=false,authenticate=false,autodiscovery=true

On server startup of Node2 set

-Dtangosol.coherence.wka1=host1.com -Dtangosol.coherence.wka2=host2.com -Dtangosol.coherence.localhost=host2.com -Xmanagement:ssl=false,authenticate=false,autodiscovery=true

Once done, save changes and restart the managed servers. Now the error should be gone.

1 comment: