In our system, there are three hosts all connected to the same ethernet switch, which is illustrated below:
A (192.168.0.21, WIN10_1809) <-> Switch <-> B (192.168.0.100, Debian Linux 9) ^ | C (192.168.0.201, WIN10_1809)
Between any two of these hosts, there are periodically network communications, both lower-level ping operations and upper-level business messages (based on TCP or UDP).
Occasionally host B and host C would find out host A is inaccessible by ping operations (which would last around 7 seconds) while host A would have no problem as of pinging host B and host C. At the same time, the upper-level TCP or UDP communications related with host A would also fail while the communications between host B and host C are totally normal.
By checking the network traffic in the system using Wireshark, we find out that host A would initiate an ARP probe and ARP announcement process when the error happens.
2838 1.501535 SuperMic_78:e0:f1 Broadcast ARP 60 Who has 192.168.0.100? Tell 192.168.0.21 2841 1.501831 JUMPINDU_64:8b:23 SuperMic_78:e0:f1 ARP 60 192.168.0.100 is at 00:e0:4b:64:8b:23 2876 1.516569 SuperMic_78:e0:f1 Broadcast ARP 60 Who has 192.168.0.201? Tell 192.168.0.21 2879 1.516654 SuperMic_8d:2f:67 SuperMic_78:e0:f1 ARP 60 192.168.0.201 is at ac:1f:6b:8d:2f:67 3234 1.817465 SuperMic_78:e0:f1 Broadcast ARP 60 Who has 192.168.0.21? (ARP Probe) 4179 2.817637 SuperMic_78:e0:f1 Broadcast ARP 60 Who has 192.168.0.21? (ARP Probe) 5043 3.817780 SuperMic_78:e0:f1 Broadcast ARP 60 Who has 192.168.0.21? (ARP Probe) 5897 4.817833 SuperMic_78:e0:f1 Broadcast ARP 60 ARP Announcement for 192.168.0.21 In which, SuperMic_78:e0:f1 is host A, JUMPINDU_64:8b:23 is host B and SuperMic_8d:2f:67 is host C.
According to RFC 5227:
Before beginning to use an IPv4 address (whether received from manual configuration, DHCP, or some other means), a host implementing this specification MUST test to see if the address is already in use, by broadcasting ARP Probe packets. This also applies when a network interface transitions from an inactive to an active state, when a computer awakes from sleep, when a link-state change signals that an Ethernet cable has been connected, when an 802.11 wireless interface associates with a new base station, or when any other change in connectivity occurs where a host becomes actively connected to a logical link.
But from the windows event log on host A, there is no evidence for any event listed above. Also, the problem happens on multiple systems in our company and it looks like networking hardware (have replaced the switch and the connecting cables) and network traffic (problem still happens even when system is idle and is with less than 1% bandwidth usage) don’t give a major contribution to the problem.
We have also checked the log files in fields and have seen no evidence of problem happening there — WIN7 and old releases of SW have been used in the field while WIN10 and new SW are being used at home.
Have investigated for nearly two months and no root cause has been found. Any advice or suggestion would be greatly appreciated. Also, please let me know if there is other places better suited for this kind of problem.