Monday, May 23, 2011

Troubleshooting DNS in regards to AD

Troubleshooting DNS in regards to AD



While not directly related to your case, I wanted to provide some good information on general DNS troubleshooting, especially in regards to Active Directory. These systematic steps can save time (and a possible new case) when trying to isolate what DNS issue is affecting Active Directory. Statistically speaking, roughly 75% of our cases here in DS support have a root cause in DNS issues, so understanding all the pitfalls can be very useful.

Troubleshooting Active Directory—Related DNS Problems - http://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/activedirectory/maintain/opsguide/part1/adogd10.mspx

This article covers understanding the symptoms of DNS issues in Active Directory and how to track down their root causes step-by-step. It explains the DNS SRV record requirements, includes useful tools with syntax, and goes into likely event log messages that will point you in the right direction for repairing DNS.

Troubleshooting Domain Name System Problems - http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/bca4b6fe-d6a6-45cc-a433-86948b178f2f.mspx

A more general guide to troubleshooting DNS issues, but still very applicable to AD. Sometimes the simplest answers are the best ones, naturally.

DNS Server becomes an island when a domain controller points to itself for the _msdcs.ForestDnsName domain - http://support.microsoft.com/default.aspx?scid=kb;en-us;275278

A very common issue explained; how DNS islanding occurs and how to prevent it from crippling domain controllers.

DNSLint utility - http://support.microsoft.com/default.aspx?scid=kb;en-us;321045

The simplest method of using DNSLINT is to verify DNS records for domain controllers. To do this, run:

DNSLINT /AD IP of dc you want to check /S IP of DNS server authoritative for MSDCS subzone

This will run and generate an HTML-based report of records on this DNS server for AD, confirming that CNAME records match up with A records. The easiest way to interpret the results is to scroll to the very bottom of the page and look at the summary report - any warnings or errors will be noted. All issues in the body of the report are color-coded for easy viewing as well. It also has the /QL functionality in order to check SRV records and the like.

The tool also has some useful extra features, like /C (used to test ports on email servers), /T (used to create TXT output instead of HTM), and /NO_OPEN which prevents the HTM file from being loaded. Using these commands together means that you can very simple script a batch file which can be used to easily spot check the overall health of your DNS in regards to AD - very useful.

Some good further info on the tool:

How To Use DNSLint to Troubleshoot Active Directory Replication Issues - http://support.microsoft.com/default.aspx?scid=kb;en-us;321046
Finally, a brief explanation of using DNSLint to determine why replication is failing between multiple DC's.
AD Replication Troubleshooting



While not precisely related to your case, I wanted to provide some further information on general troubleshooting of AD Replication issues in domains and forests. The steps and tools below can be used to detect and repair the most common replication issues, and may save you a support call someday. I hope you find these useful!
Troubleshooting Active Directory Replication Problems - http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/4f504103-1a16-41e1-853a-c68b77bf3f7e.mspx

A good general guide to deciding how to approach replication failures and break the problem down into its component pieces. It also covers the REPADMIN tool which will be instrumental in seeing error codes that drive how the troubleshooting is done.

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088) - http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/43e6f617-fb49-4bb4-8561-53310219f997.mspx

The most common cause of replication failure is DNS lookup issues (where we are unable to resolve CNAME records to servers in order to complete the replication ring). This guides an admin through systematically tracking down replication issues caused by DNS, and what sorts of errors mean specific conditions. In Windows Server 2003 SP1 this has been mitigated to some extent, and this is covered in detail in the above article.

Fixing Replication Connectivity Problems (Event ID 1925) - http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/7fcaa311-bc19-479d-9a4e-179704dfe08f.mspx

A step-by-step guide to determining replication issues caused by network problems (not related to DNS). This covers simple initial tests like PING and PING -L then moves on to more advanced steps like network tracing.

Fixing Replication Topology Problems (Event ID 1311) - http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/062e8eaa-27e0-4c5e-bc2b-2913ecce24b8.mspx

This article covers replication issues caused by issues in the overall site topology, where there is insufficient physical connectivity to complete the replication ring. This means that replication would work fine if the DC's sites and connections were configured more optimally, and there are no other underlying connectivity issues with DNS or the network itself.

How the Active Directory Replication Model Works - http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/TechRef/1465d773-b763-45ec-b971-c23cdc27400e.mspx

Finally, I wanted to provide more detailed information on how replication actually works. With an understanding of the system and how it operates, it becomes much easier to see where issues are and how to approach troubleshooting them. This guide goes into great detail on USN's, consistency, change notification, scheduling, linked values, and all the other pieces that make up this complex system.

No comments:

Post a Comment