Clicking on the Manage cluster Nodes icon on the admin panel, will lead you to the node management panel. From here the administrator has an overview of all the nodes in the cluster.
For each node the page displays:
The best practice for adding new nodes is to use Karamel/Chef. However, in case the administrator wants to include an already provisioned node in the cluster, a button to Add new node is available.
Clicking on the button will show a pop up to insert the hostname and the private IP of the node the administrator wants to include.
Once the node has been added in Hopsworks, the administrator should instruct Kagent to register. To do so, he/she should edit Kagent’s configuration file (Default /srv/hops/kagent/etc/config.ini) and run the /srv/hops/kagent/host-certs/run_csr.sh script. The script will register with Hopsworks and get a valid X.509 certificate for the host. When the process is done, the administrator can start the remaining services.
Note, remember to copy the anaconda environments to the newly provisioned host, as this is not performed as part of the Add new node process.
By default hosts certificate have a 10 years validity period. However, we encourage administrators to periodically rotate the certificates. To rotate the certificates of one or more hosts, administrators can select the desired hosts in the UI and click the Rotate host keys button.
This will have the effect of generating a new certificate for each host and signing it with the Hopsworks CA. Hopsworks services will pick up the new certificate automatically.
In case the Kagent process crashes or becomes unresponsive on a node, it is possible to restart it from the node management panel. Administrators can select the nodes for which they want to restart Kagent and click on the Restart Kagent button. Please be aware that this feature relies on SSH. Hopsworks needs to be able to SSH into the selected node and restart the process. SSH keys are automatically setup by Karamel/Chef during deployment, however, your infrastructure might not allow SSH using keys and/or limit the list of users that can SSH into the nodes. In the case that Hopsworks cannot ssh into the hosts, you will have to restart the Kagent service manually on the host using the command systemctl restart kagent.
Hopsworks can also be configured to automatically monitor the state of the Kagent processes and restart them in case they fail to heartbeat for a configurable period of time. Administrators can enable this feature by specifying the following attributes in the cluster definition:
hopsworks:
kagent_liveness:
enabled: true
threshold: "40s"
As for the start, stop and restart button, this feature relies on Hopsworks being able to SSH into the nodes.