We have noticed that Azure Virtual Machines are shutting down or stopping unexpectedly in the middle of a session or during startup. There are various reasons why this issue would occur. Usually, the lack of resources would affect your VM; however, since it is not an Azure cloud service, we will need to discuss this further. In this post, we are going to talk about this issue and see what you can do to resolve it.
Why does my Azure VM shut down randomly?
Your Azure VM may shut down unexpectedly for several reasons. If it’s a Spot VM, Azure can terminate it suddenly to reclaim resources. Additionally, high CPU, memory, or disk usage may overload the VM, so it’s essential to monitor performance metrics. Additionally, an accidental auto-shutdown schedule in the Azure settings could cause this issue. You should be wary that problems with the Azure agent may also lead to unexpected stops; restarting it might help. Third-party tools, such as antivirus software or scripts, can trigger shutdowns. Additionally, Azure may initiate a shutdown due to hardware failures or storage issues, which can be checked in the health logs. Lastly, group policies for Windows VMs or exceeding storage limits may also be factors.
Fix Azure Virtual Machine (VM) is shutting down or stopping unexpectedly
If an Azure Virtual Machine (VM) is shutting down or stopping unexpectedly, you need to follow the steps mentioned below.
- Restart Azure Linux Agent
- Check for automated shutdowns
- Adjust VM Size
- Check third-party triggers
- Investigate Azure-Initiated Shutdowns via Root Cause Analysis (RCA)
Let us talk about them in detail.
1] Restart Azure Linux Agent
One of the common reasons for the unexpected termination of the Azure VMs is a problematic Azure Linux Agent. Even though the agent is not problematic, since it highly influences the environment, any glitch can put the whole environment to a halt. As we are not able to log into the system, we will connect to it remotely and then restart the agent.
Before you can troubleshoot or manage any services on your VM, you must first log in to it via Secure Shell (SSH).
Once done, run the following command.
systemctl status waagent
This command queries the status of the WAAGENT service using the systemd service manager. It will show you whether the service is active (running) or if it has encountered any errors.
Now, to restart it, you need to run the following command.
systemctl restart waagent
The command stops the waagent service and then starts it again. Restarting the service can often clear temporary glitches or states that might have caused it to malfunction.
2] Check for automated shutdowns
Next up, we need to check and ensure that there are no automated shutdowns in place. If somebody had configured a trigger with a threshold destined to shut down the system, we need to investigate. Start by checking the Azure Portal under the VM’s Operations settings to see if an Auto Shutdown configuration is active. Next, review your Automation Accounts to verify that no scheduled tasks or runbook scripts are inadvertently set to shut down the VM at specific times. If you identify an auto-shutdown policy that isn’t required, disable it or adjust its schedule as appropriate.
3] Adjust VM Size
If your virtual machine is consuming a significant amount of resources, such as CPU, memory, and I/O load, it can result in resource exhaustion, which may cause your virtual machine to shut down.
You can use Azure Monitor and Metrics to check resource utilization graphs for CPU, memory, and disk metrics. If you see notice that your VM configuration lags behind, consider scaling up to a more powerful SKU. Also, evaluate running applications, optimize, or offload intensive processes if possible. Hopefully, by doing so, you will be able to stop your machine from getting overwhelmed and terminating unexpectedly.
Read: How to enable or disable Nested Virtualization for VMs in Hyper-V
4] Check third-party triggers
External factors, such as third-party management tools, antivirus software, or improperly applied Group Policy settings, can inadvertently cause a VM to shut down. To address this, check your installed software to identify any monitoring or security applications that might be conflicting with normal operations, examine Group Policy settings on Windows VMs to ensure that no policies are enforcing shutdown commands, and check for any recent maintenance notifications from Azure that could have initiated a shutdown. Ruling out these external triggers helps you isolate the issue and prevent unintended interruptions to the VM’s operation.
Read: In-place upgrade is not supported on Azure Virtual Machines
5] Investigate Azure-Initiated Shutdowns via Root Cause Analysis (RCA)
To identify platform-related shutdowns like hardware failures or storage problems, use Azure’s Root Cause Analysis (RCA) tools by checking the VM’s Resource Health section in the Azure Portal for events such as Unexpected shutdown or Platform-initiated shutdown, which could point to issues like node failures or storage timeouts. Next, review the Activity Logs to filter shutdown events and check the Event Initiated By column for reasons like hardware auto-recovery or connectivity loss. If the issue is linked to a host node failure, use the Redeploy feature to move the VM to a healthy node, resolving underlying hardware problems.
That’s it!
Read: Enable Hotpatch for Azure Edition virtual machines built from ISO
Why does my virtual machine shut down unexpectedly?
One of the reasons why the virtual machine shuts down unexpectedly could be overconsumption of resources. You can restart the host machine or kill processes running on it to keep it in check. Additionally, make sure that you have given enough resources to the VM.
Leave a Reply