Over the past few months, I’ve found myself having to troubleshoot a variety of DNS and VPN-related issues that a small subset of our users have encountered. These kind of issues can be caused by all sorts of reasons: network configuration, DNS resolvers, the VPN connection and its configuration, third party clients, updates to macOS, etc. Working from home makes it even more difficult to determine what the cause might be.
The goal of this post is in a way to help future me but also to share what I’ve learned in the process in terms of troubleshooting these kinds of networking issues.
The Questions
There should be a series of questions that you ask the end user having the problem which are not limited but should include:
- What version of macOS are you running?
- Have you recently installed any software?
- Are you running any security software?
- If you go to the Apple menu > System Preferences > Network, can you confirm the active connections (with a green dot) and the order in which they appear?
- Are you experiencing the problem on a wired connection?
- Are you experiencing the problem on a wireless connection?
- Are you experiencing the problem on a personal hotspot connection?
- Are you experiencing the problem on a completely different network (e.g. a friend’s house, at home, in the office, etc.)?
- Does the problem occur after you’ve left the computer idle for some time?
- Do you have dates/times for when the problem occurred?
- Are you using any VPN connection when this problem occurs?
- Have you noticed a pattern as to when the problem occurs?
- Does the problem occur if X software is disabled/uninstalled?
Some of those questions you may easily be able to answer yourself through your own management tools. Additionally, some of these questions the user may have already answered when they reached out for assistance.
The questions above give you a bit of context as to how the problem manifests. However, it may not be sufficient to reach a resolution. You need to look at logs.
The Logs
It’s important to gather logs to at least start looking what could be causing the problem. Ideally, the user provides some a timestamps to make looking at the logs a little easier. However, what are these “logs” that we’re supposed to be looking at?
You will first want to look at logs generated by the OS itself. macOS has a unified log system. I won’t go into the history, but will instead reference a few good resources that have helped me understand things a bit better as it pertains to the unified log system:
If you look at the man page for the log
command, it can be quite overwhelming at first glance. They do provide a few examples which I do think are helpful for how you can combine different expressions under the --predicate
filter option. But the page alone isn’t sufficient which is why the above blog posts are really handy.
Once you have an idea of how to use the log
command, you need to then know what to look for. For this part of the process, I had some assistance from the third party Cisco Umbrella Diagnostic Tool. It captures quite a decent amount of information. One of the commands that it runs:
/bin/sh -c /usr/bin/tail -n 15000 /var/log/system.log | /usr/bin/egrep -i "kernel|launchd|vpn|dns|configd|racoon|umbrella"
This command will go through the last 1500 lines in the system.log and only show lines which match any of the phrases separated by “|”.
One of the first things I noticed, is that not as much is written as one would expect in /var/log/system.log
. Obviously, this is problematic if we we’re looking for information. However, the commands do provide a little bit of clarity as to what processes or terms we want to look for. Combined with the knowledge that you need to really look at the log stream, I went to work on writing a lengthy one line command that I could then run to capture relevant information that the system has logged.
The Logs Commands
Before I get to the one line of code, I’d like to go through walk you through the process on how I got there.
First off, it’s important to understand that I’m asking users to run this and it’s much easier to write the log to a common location which I’ve simply chosen: /Users/Shared/system-dns-logstream.log
. You can obviously write out the log to any other location.
Secondly, the log
command can be very verbose and depending on the predicate filter, the resulting log file might be too large. For that reason, you’ll notice I use --last 8h
which references the fact that I want only the last 8 hours of logged information to be shown. You can modify it accordingly if you want to 48h (for 2 days) or 10m (for minutes) depending on how far back you need to look.
To look at anything logged by the Network Extension framework subsystem run this command. “raccoon” is traditionally the vpn process in macOS that you’d want to look for in logs. However everything it logs will be written to the Network Extension subsystem.
sudo log show --last 8h --predicate 'subsystem == "com.apple.networkextension"' > /Users/Shared/system-dns-logstream.log
To look at anything logged by the System Configuration framework subsystem, run this command:
sudo log show --last 8h --predicate 'subsystem == "com.apple.SystemConfiguration"' > /Users/Shared/system-dns-logstream.log
symptomsd framework subsystem
symptomsd
is a Symptom framework service daemon. I don’t have much official documentation I can link to here except to note that it was mentioned in one of the useful links I provided earlier. What I noticed is that everything networking related fell under the category “netepoch” which provides more targeted information.
sudo log show --last 8h --predicate 'subsystem == "com.apple.symptomsd" AND category == "netepochs"' > /Users/Shared/system-dns-logstream.log
DNS
To look at any event messages that contains the phrase “DNS”, run the command:
sudo log show --last 8h --predicate 'eventMessage CONTAINS[cd] "dns"' > /Users/Shared/system-dns-logstream.log
VPN
To look at any event messages that contain the phrase “VPN”, run the command:
sudo log show --last 8h --predicate 'eventMessage CONTAINS[cd] "vpn"' > /Users/Shared/system-dns-logstream.log
To look at anything logged by the Network framework under the “connection” category, run the command:
sudo log show --last 8h --predicate 'subsystem == "com.apple.network" AND category == "connection"' > /Users/Shared/system-dns-logstream.log
confgd
configd is the System Configuration daemon that, according to the man page: “is responsible for many of the configuration aspects of the local system”. Although, not everything is networking related, a good chunk is. You may also note from one of the earlier links that one of the commands specifically looks at the subsystem “com.apple.IPConfiguration”. This command will cover events logged under that subsystem. To look at any event messages that contain the phrase “configd”, run the command:
sudo log show --last 8h --predicate 'eventMessage CONTAINS[cd] "configd" OR process == "configd"' > /Users/Shared/system-dns-logstream.log
CoreUtils
I do not have anything to reference here other than seeing it referenced in one of the links I mentioned above. Rather than filter for specific categories, I’m looking at the entire CoreUtils subsystem. Run the command:
sudo log show --last 8h --predicate 'subsystem == "com.apple.CoreUtils"' > /Users/Shared/system-dns-logstream.log
Umbrella (or your own third party tool)
Cisco Umbrella technically logs under a process called “dns-updater”, but there may be references where Umbrella is referenced in. To look at any event messages that contain the phrase “Umbrella”, run the following command:
sudo log show --last 8h --predicate 'eventMessage CONTAINS[cd] "umbrella"' > /Users/Shared/system-dns-logstream.log
Note: if you’re using another networking tool or service, you can alternatively include the name of that tool instead of “Umbrella”.
dns-updater (or your own third party tool)
dns-updater
is the process under which Cisco Umbrella runs and will do its logging. To look at any event messages generated by the process dns-updater
used by Cisco Umbrella, run the following command:
sudo log show --last 8h --predicate 'process == "dns-updater"' > /Users/Shared/system-dns-logstream.log
Note: if you’re using another networking tool or service, you can replace “dns-updater” with whatever the process name is for that tool.
It may be helpful to look for other potential events related to launchd
which is why this command is included. However, this can be quite noisy which is why there are additional expressions in the predicate filter to exclude certain event messages. To look at any event messages that contain the phrase “launchd”, run the command:
sudo log show --last 8h --predicate 'eventMessage CONTAINS[cd] "launchd" AND NOT eventMessage CONTAINS[cd] "invoked (by pid 1/launchd)" AND NOT eventMessage CONTAINS[cd] "OSLaunchdJob"' > /Users/Shared/system-dns-logstream.log
Log: Putting It All Together
Obviously, we’re not going to ask someone to run each of those commands individually. Plus it would be really difficult to lineup the times from each command. The log
command makes it easy to combine all these expressions even if it doesn’t necessarily look pretty.
When putting it all together, I did add an additional filter to ignore any references to the log
command using: AND NOT eventMessage MATCHES ".(/usr/bin/log)."
In the end, the command looks like:
sudo log show --last 8h --predicate '((subsystem == "com.apple.networkextension") || (subsystem == "com.apple.SystemConfiguration") || (subsystem == "com.apple.symptomsd" AND category == "netepochs") || (eventMessage CONTAINS[cd] "dns") || (eventMessage CONTAINS[cd] "vpn") || (subsystem == "com.apple.network" AND category == "connection") || (eventMessage CONTAINS[cd] "configd" OR process == "configd") || (subsystem == "com.apple.CoreUtils") || (eventMessage CONTAINS[cd] "umbrella") || (process == "dns-updater") || (eventMessage CONTAINS[cd] "launchd" AND NOT eventMessage CONTAINS[cd] "invoked (by pid 1/launchd)" AND NOT eventMessage CONTAINS[cd] "OSLaunchdJob")) && (NOT eventMessage MATCHES ".(/usr/bin/log).")' > /Users/Shared/system-dns-logstream.log
It’s not pretty, but it is readable. An alternative was to make more use of the MATCHES operator and make use of regular expressions. Be mindful of the fact that the resulting file may easily be bigger than most users might be able to send over email.
Once you get the log file, make use of either the Console app or your log reader/text editor of choice to navigate its content. It will look overwhelming initially, but you’re essentially looking for references to where network changes may have occurred.
Because the log system can go back so far, you can ask the user to run this command at the very end of any troubleshooting you might be asking them to do.
Other Logs
I’d be remiss if I didn’t mention that if you’re dealing with third party tools, there may be specific logs they write to. Places to look may include:
- /var/log/
- /var/logs/
- /Library/Logs/
- ~/Library/Logs
I would also consult the vendor and/or their documentation for where their application(s) may keep logs on the system.
Network Troubleshooting
In addition to looking at logs, there are some basic networking troubleshooting steps you can run through that may be helpful as well. Replace fdqn.server.com with the fully qualified domain name of the host you’re trying to reach.
ping -t 5 fdqn.server.domain
Use ping to send ICMP ECHO_REQUEST packets to the server that cannot be reached. The server may not respond to these kind of packets. This will send 5 packets. Assuming you get a response, pay attention to the time it takes to get a response. High values may indicate high latency issues. Anything over 100ms may indicate high latency which can impact communication between the system and host/service you’re trying to reaching.
Same as before, except we’re using the IP address to send the request which can start to fill in the picture as to whether there may be a DNS issue at play.
Use the DNS lookup utility to do a lookup using the default DNS resolver
dig fdqn.server.domain
@1.1.1.1
Use the DNS lookup utility to do a lookup using CloudFlare’s DNS resolver. Other alternatives include: 8.8.8.8 (Google), 9.9.9.9 (Quad9), 208.67.222.222 (OpenDNS)
/usr/sbin/traceroute -I -w 2 fdqn.server.domain
Use the traceroute tool to determine the route packets are taking to reach a particular host based on the host name.
/usr/sbin/traceroute -I -w 2 <IP Address>
Use the traceroute tool to determine the route packets are taking to reach a particular host based on the IP address.
The network interface configuration command will show you all network interfaces that are currently configured on the system.
This reports the DNS configuration on the system. This is useful for seeing what domains may be configured to use specific name servers.
Show the network routing table configured on the system using network addresses in numeric format. With a VPN connection, you will see additional routes. On macOS Big Sur, you’ll note that the first column representing destination network ranges are in CIDR format. The second column will point to the IP address of the server that does the resolving. Depending on your networking experience, this may or may not make too much sense. But it’s important to understand nonetheless. If you’re trying to reach a network address and it falls under one of the listed entries that is not default, then it means the traffic will be route to a different server than default.
route -n get fdqn.server.domain
This will display the route taken to reach a specific host by and attempt to show the network addresses in numeric format.
- Other commands that assist with doing name lookups include
nslookup
and host
. Apple includes a particularly useful and important note at the bottom of the nslookup
man page that is applicable to the 3 tools:
The nslookup command does not use the host name and address resolution or the DNS query routing mechanisms used by other processes running on macOS. The results of name or address queries printed by nslookup may differ from those found by other processes that use the macOS native name and address resolution mechanisms. The results of DNS queries may also differ from queries that use the macOS DNS routing library.
man nslookup (macOS Big Sur)
That does not invalidate the usefulness of the commands. But it is important to keep in mind that some applications/processes using the built-in OS routing mechanisms may differ from what you see in some of these command line tools.
Also, keep in mind you may want to run through all these commands under different scenarios. For example, you may want to run these commands with a security tool turned off and turned on. Or maybe when on a specific network vs another network. Or perhaps when connected to VPN vs disconnected from VPN.
Packet Capture
Packet captures can be really useful for determining what might be going on. I won’t go into how to analyze the packet captures. I found this particular Youtube video useful as a starter. Your network engineer may also be a good resource. And of course you can just search the internet on how to use tcpdump
.
To create a capture of all network interfaces on the system:
- Run the command in Terminal:
sudo tcpdump -i any -w /Users/Shared/packet.pcap
- Keep the Terminal window open
- Reproduce the issue as soon as possible.
- Go back to the Terminal window and stop the packet capture by press CTRL + C
It’s important to note that packet capture files can result in huge files if left running for more than 1 minute. In fact, 1 minute may be too long. Once you do have the packet capture, you will want to use an application like Wireshark to visually analyze the packet capture. To avoid large packet captures, you will most likely need to be able to reproduce the problem very quickly.
It may be useful to have a packet capture from a device (either the same device or another) where the problem cannot be reproduce (things work as expected) to compare to the packet capture from the device where the problem can be reproduced (things do not work as expected).
Other considerations
Wireless connections can tend to be less reliable than wired connections. A particularly useful hidden function in macOS is the ability to get wireless statistics by simply OPTION + Clicking the Wi-Fi menu. You can then ask a user to take a screenshot CMD + SHIFT + 4 + Spacebar:
The further away from 0 a signal gets, the more noise on the connection. This can impact the reliability on the connection and could potentially result in sporadic networking issues that are hard to reproduce.
You’ll also note that you can run some diagnostics as well which may also be useful to go through as well.
That’s all I have for the time being. If there are any particular network troubleshooting tips you find useful in macOS, feel free to share in the comments.