I added a Raspberry Pi 4B recently in my ever expanding homelab. To get the best network with the new gigabit ethernet port on the raspi, and still save power, I added a PoE hat to it so I could power the raspi as well as provide it data through the ethernet port. Everything worked fine except that I ocassionaly got into situations where the raspi stopped responding suddenly. Initially I thought that it’s crashing due to some issues and used to just restart it manually. I tried switching from manjaro-arm to raspbian as well but that showed same symptoms, despite updating to the latest bootloader and firmware as well. Then I noticed that I could hear the fans on the PoE hat whirring up and settling down even when the raspi was inaccessible. This put me in a doubt that the raspi hasn’t actually crashed, because the fans whir up and go down according to the CPU temperature. So this pattern meant that the CPU was still doing something to heat it up. A trip to the kernel logs via
journalctl -xe confirmed the doubt and also showed a few weird messages about the ethernet.
I tried several things by updating the kernel, picking up several patches, trying different distros but nothing helped. I also found a few such issues reported on raspi’s forums/elsewhere without much updates and with/without the PoE hat. Ultimately, I tried a simple solution of resetting ethernet via
ethtool when the issue happens and it worked fine as a workaround. If you see this issue as well and want to employ this workaround, it’s pretty easy to do via creating the following 3 files (Note: I’ve used systemd timers here. You could instead do it through cron as well).
File 1: monitor.sh
This file has the code to check the network state and reset ethernet
#!/usr/bin/bash BASEDIR=$(dirname "$0") now=$(date) if ping -q -c 1 -W 1 google.com >/dev/null; then echo "$now : The network is up" >> $BASEDIR/eth.log else ethtool -r eth0 echo "$now : Network down. Trying to restart eth0" >> $BASEDIR/eth.log fi
File 2: monitor.timer
[Unit] Description=Monitor [Timer] OnUnitActiveSec=200s OnBootSec=100s [Install] WantedBy=timers.target
File 3: monitor.service
[Unit] Description=Monitor [Service] Type=oneshot ExecStart=/bin/bash /home/shantanu/monitor.sh [Install] WantedBy=timers.target
Now, the raspi checks every 200 seconds whether it still has a network connection and if it doesn’t, then it restarts eth0 to recover the connection. This workaround has been working pretty good for me so far.