We had a customer recently ask us to help with a report that would show information containing both SQL Server and Active Directory data. There are many good articles about querying AD data from SQL Server and I’ve referenced several below. The problem with most of these approaches is that they either don’t return a full result set or they require a change to the AD MaxPageSize value, which is kind of a no-no in AD Land. Read on past the article links for a better approach to returning full result sets from AD-SQL Server joins without selling the farm with the MaxPageSize value.
- Creating a Heterogeneous Join between SQL Server and Active Directory: http://msdn.microsoft.com/en-us/library/aa772367(v=vs.85).aspx
- Querying Active Directory Through SQL Server: http://blog.tech-cats.com/2007/09/querying-active-directory-through-sql.html
- Querying Active Directory from SQL Server: http://blogs.officezealot.com/mauro/archive/2007/10/25/20607.aspx
- Joining Heterogeneous Data: http://msdn.microsoft.com/en-us/library/aa746379.aspx
- Query Active Directory Data from SQL Server using T-SQL: http://www.databasejournal.com/features/mssql/article.php/3849891/Query-Active-Directory-Data-from-SQL-Server-using-T-SQL.htm
But We Don’t Get All the Results
In our customer’s situation—and actually in our internal testing as well—the results of the SQL Server-AD join query were not complete. A number of results would get returned but not the full result set. And we got the wonderfully obtuse message below.
Msg 7330, Level 16, State 2, Line 1 Cannot fetch a row from OLE DB provider "YourProvider" for linked server "YourLinkedServer"
Okay. We checked out the SQL Server query by itself and it worked fine and returned all rows. I get it…something in the join with AD data is causing the issue. But what?
Dang AD MaxPageSize is In the Way!
After some time investigating, we discovered the source of the issue. Active Directory limits the number of results returned in a single search result based on the MaxPageSize setting in Active Directory, which by default is 1000 results. This acts as a protection to ensure domain controllers are not overwhelmed as well as to limit network traffic. LDAP is designed to allow paging of query results and this is the proper way to retrieve more than the MaxPageSize setting. However, SQL Server does not appear to page AD data through a linked server and thus we see fewer than 1000 results returned when we have more than 1000 objects in our search scope.
In many cases, we saw 901 records returned and you could avert the error by using the T-SQL “top” keyword to do something like “SELECT TOP 901…”. However, this is still not the proper solution as we really need to return the entire result set.
Don’t Change the MaxPageSize…There is a Better Solution
Why not just change the AD MaxPageSize value to something that covers the number of our objects in our search scope? Well, you can do this but it’s a poor practice and, well, it’s basically uncool and puts you at risk of decreasing your network and DC performance. Check out the articles below for some justification and then read on for a much better solution.
- Avoid changing the MaxPageSize LDAP query policy: http://jeftek.com/219/avoid-changing-the-maxpagesize-ldap-query-policy/
- Limiting LDAP searches with MaxPageSize: http://searchwindowsserver.techtarget.com/tip/Limiting-LDAP-searches-with-MaxPageSize
The Better Solution that Avoids Changing MaxPageSize
So what is the better solution than changing the MaxPageSize value in AD? Well, it comprises using a stored procedure to perform a proper paged AD query and then joining the data from the AD query to your SQL Server tables. The good news is that we have the stored procedure available for you below as well as a sample of how you can use the procedure in your query.
Many thanks go to the source of most of the information we used to put together this solution. Parikshit Savjani of Microsoft (at least at that time) put together a nice reference and explanation plus the source code for the stored proc that we used in our solution. Check out the article at http://www.sqlserverfaq.net/2010/09/20/how-to-query-active-directory-to-fetch-more-than-1000-rows-using-sql-serverpart-i-2/ to learn more about the background of our solution.
So on to the solution. The solution comprises a stored procedure and a temp table. We use the temp table, @ADTest in this case, to store the results of the stored procedure that runs the AD paged query. We then join the temp table to our SQL Server table just like any other table join.
The Stored Procedure
You can download the stored procedure script (sp_QueryAD.sql) to create the sproc from our website. Once downloaded, run the appropriate portions of the script in your SQL Server environment to create the stored procedure and prepare the OLE Automation Procedures. Pay attention to the OLE Automation Procedures comments at the top of the script as well as the example usage of the stored procedure.
You can get more information about OLE Automation Procedures at the following links.
Join the AD Results with SQL Server Data
Now you have a stored procedure that you can use to get paged AD data beyond the default MaxPageSize value of 1000 results. Use the script below as a model to join the results of the stored procedure with your SQL Server data. You can obviously take many liberties in your naming conventions and this should serve as a sufficient model to get you going.
BEGIN --Create a temp table for the AD results. DECLARE @ADTest TABLE (Column0 varchar(100), Column1 varchar(100), Column2 varchar(100)) --Get the AD data using the stored procedure. INSERT INTO @ADTest EXEC sp_QueryAD 'SELECT adspath, samAccountName, objectGuid FROM ''LDAP://dc=yourdomain,dc=local'' WHERE objectCategory=''person'' AND objectclass=''user''', 1 --Join the temp table and production table from SQL Server to join the AD results to the SQL Server data. SELECT alias.* FROM YourCatalog.Table alias LEFT JOIN @ADTest adresult ON adresult.Column0 = alias.GUID END
The column order is funky as it appears to be reversed. So joining on objectGuid would seem to need to use Column2 but it’s actually using Column0 since the third column in the AD query is the first column in the result. Be aware of this and play with it some as you work through this in your organization.
Also remember if you add columns to the query from AD that you need to add columns to the temp table.
Joining Active Directory and SQL Server data is a nice capability but it has it’s flaws. You can use the procedure here through SQL Server to join AD and SQL data or you might look into programming opportunities with PowerShell or native .NET languages like C# to do the same work by using .NET data objects to encapsulate the AD and SQL data and then processing through code.
Let me know if you have experience with these types of data joins and I’d love to see how you’ve approached a solution, especially for ad hoc reports that may need to be run infrequently.