Apparently kill -9
always works… except when it doesn’t. It also so seems that 99% of the time, when it doesn’t work, this is because the process is trying to access an NFS mount. This is handled in the Linux kernel and (unless you mount NFS with the soft
option) it will retry forever.
If you can reboot your Linux machine, just do that. It’ s easier. If not, you can trick the process into timing out. To do that, you need to add the IP of the NFS mount locally and temporarily run an NFS server. If your NFS server is at IP 1.2.3.4, you would do:
ifconfig eth0:nfstmp 1.2.3.4 netmask 255.255.255.255
apt-get install nfs-kernel-server
apt-get install nfs-kernel-server
Now wait a moment and your stuck process should exit. Now you can clean up:
ifconfig eth0:nfstmp down
apt-get --purge remove nfs-kernel-server
apt-get --purge remove nfs-kernel-server
Easier would be to kill the active NFS session using tcpkill. This will send RST packets to the server forcing the connection to be re-established.
# tcpkill -i eth0 port 2049
This will make the NFS connection to be restarted in a matter of seconds. It’s a good way to make the kernel react if it gets stuck somehow.
We have seen this a few times, tcpkill works best.
Great tip 🙂 Thanks Dag
Amazing, this helped a lot
P.S. I would suggest allocating the IP to a virtual nic