1) SQL SERVER – How to DROP or DELETE Suspect Database?
As we can see above, one we start SQL Service, the database goes through recovery and its failing there. Due to this, we were not able to drop the database. I have not seen this earlier as I was always able to drop such databases for restore. Here are the steps we did to drop the database and perform the restore:
Since the files are not available the database startup would fail, and the database would go to “Recovery Pending” state. After this, we were able to drop the database. As I mentioned they were ready to restore it from the backup and it worked well.
Here is the error message and guidance.
CHECKDB found 0 allocation errors and 39 consistency errors in database ‘MegaMart-AWS’.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (MegaMart-AWS).
Now what is the correct way to run DBCC CHECKDB with minimum repair level.
ALTER DATABASE MegaMart-AWS SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DBCC CHECKDB(N’MegaMart-AWS’, REPAIR_ALLOW_DATA_LOSS);
ALTER DATABASE MegaMart-AWS SET MULTI_USER;
One of my clients to whom I helped in configuring Always On Availability Groups came back to me with an interesting situation. They have observed blocking of reading queries on the secondary replica. Since the database is in read-only mode, they wanted to know how write is being performed in the database which is causing blocking?
I knew that this is not a user write activity but must be a system write activity which is causing blocking. When I started troubleshooting, I found below.
When I looked further, I found the object name was ifts_comp_fragment_484196875_10739738 and it was an INTERNAL_TABLE.
It became clear that the redo thread was getting blocked not a user session. This causes the replica to start lagging because redo stops often. In my lab, I also observed that if a database with a full-text index is in an availability group, we can see the same type of blocking whenever the full text is index is enabled for automatic or manual population, and if there are read queries running full-text searches.
For my client, we were able to prevent this behavior by disabling change tracking. My client was OK with disabling change tracking on the full-text index temporarily and then setting up an incremental population on a schedule. Here is the T-SQL to change the tracking to manual.
Later I suggested my client to refer Populate Full-Text Indexes and think about “Incremental population based on a timestamp”. This was a long-term solution for them.
Answer: This is a very popular question I see in SQL Server Interview Questions and Answers. SQL Server stores all the data for query execution in DMVs inside it. The DMVs are reset when SQL Server services restart. Let us see a quick query which returns us longest run query with its execution plan.
SELECT TOP 10
s.execution_count AS ExecutionCount,
s.max_elapsed_time AS MaxElapsedTime,
ISNULL(s.total_elapsed_time / 1000 / NULLIF(s.execution_count, 0), 0) AS AvgElapsedTime,
s.creation_time AS LogCreatedOn,
ISNULL(s.execution_count / 1000 / NULLIF(DATEDIFF(s, s.creation_time, GETDATE()), 0), 0) AS FrequencyPerSec
FROM sys.dm_exec_query_stats s
CROSS APPLY sys.dm_exec_query_plan( s.plan_handle ) u
CROSS APPLY sys.dm_exec_sql_text( s.plan_handle ) t
ORDER BY MaxElapsedTime DESC
If you run above query it will return us resultset in the SSMS. Scroll all the way to the right side and you will notice that there is a column called Query_Plan. This is the column which displays the execution plan of the query. Click on it and it will bring up SQL Server Query Plan in SSMS. Once you have an access Query Plan, you can also send that via email to your friend
You can enable Graphical actual execution plan via typing keyboard shortcut CTRL+M.
In SSMS you can also enable a graphical execution plan via clicking on the icon on the toolbar.
Once you follow any one of the above advice, you can run any query in the query editor in SSMS and it will display actual execution plan in the separate window in your SSMS resultset.
Once you are looking at your execution plan, you can follow the same method to disable the execution plan.
This is another method to see the execution plan. We will use the STATISTICS PROFILE to see the actual execution plan.
You can execute the following query and in the resultset area, you will see the execution plan in the text format.
You need to scroll down below the execution plan to see the text-based execution plan which will be displayed in the grid format.
During my recent Comprehensive Database Performance Health Check, we had implemented quite a few improvements in the database. After making the improvements in the database, we reached to the point where we needed to clear the cache of the server to see how our improvements fairs with the changes which we have made.
While we were discussing about the clearing the cache, the senior DBA at the organization, in rush immediately jumped to the Query Editor in the SQL Server Management Studio (SSMS) and wrote down following command.
As soon as I saw him writing down the above command, I told him to stop immediately and explained that if he ran above command on the server it will drop the plan cache for the ALL the database on the server and that is something not recommended. If drop the plan cache for all the server, during the business hours, SQL Server will be under pressure to re-create all the plans and it may negatively impact the performance as well.
As we had made improvements in a single database and our need was to clear the cache for a single database and hence, here is the command to remove the cached plans for a single database only.
DECLARE @dbid INT = DB_ID();
DBCC FLUSHPROCINDB (@dbId);
If you are using SQL Server 2016 or later version, you can also run the following command:
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE
Well, that’s it. It is very simple to remove the cache for a single database, I strongly suggest that you only do it in the extreme conditions, as in most of the cases, you do not need it.
While working with a client, I encountered few errors while using linked server and I am going to share my solutions via a few blog posts. In this blog we would talk about fixing error 15274 – Access to the remote server is denied because the current security context is not trusted.
Before I talk about the problem and the solution, I must share a few environment details.
In my lab, I got two servers SQLSERVER-0 and SQLSERVER-1. The linked server was created on SQLSERVER-1 which was given name as ONE, connecting to SQL SERVER-1. There are two databases involved here. On source server (SQLSERVER-0) I have a database called SQLDB0 and on destination (SQLSERVER-1), a database is called SQLDB1.
1) Linked server “test connection” was working just fine.
3) Simple queries were working fine but a stored procedure which was using “execute as user” was failing.
Here is the error which was coming when we were executing a stored procedure. Here is an oversimplified version of the stored procedure. The procedure is created in database SQLDB0.
CREATE PROCEDURE usp_fetch_data
EXECUTE AS user = 'user_no_login'
And here is the error message when I execute it as below.
Here is the text of the error message.
Msg 15274, Level 16, State 1, Procedure usp_fetch_data,
Line 5 [Batch Start Line 9]
Access to the remote server is denied because the current security context is not trusted.
I captured profiler trace but found nothing interesting. Since error message was talking about “TRUSTED”, I recalled TRUSTWORTHY property of the database.
My feeling was correct. As soon as I changed the database property on the source database, the above error disappeared. Here is the T-SQL to check the property.
If you see is_trustworthy_on set as 0 (ZERO) then run below command to enable it and make it 1 (ONE).
Question: How to Track Autogrowth of Any Database?
Answer: Just the other day during the Comprehensive Database Performance Health Check, we identified that the SQL Server Performance problem of the user was because of the constant Autogrowth of their main database which ran lots of e-commerce orders.
We put our database under observation and figured out a high amount of activities during a certain period of the time and along with that also noticed that we have an ETL job running at the particular time which overlapped some busier time of the database.
We finally decided to put a trace flag to identified when and who did the autogrowth for the database. Once we identified the issue, we increased the value of the auto growth for database files and eventually resolve the problem. Here is the configuration setting where we can increase the autogrowth settings for database files.
What happens when you try to copy a large file from different hard disks? The operating system tries to see if the destination location has enough space available for the copy to be successful. If we don’t have enough space, then it is second to human nature to make space and restart the copy operation. I am sure most of us who use USB drives have done this task. If you look at SQL Server, it does a similar act. When there is no space in the file while writing, SQL Server goes ahead and expands the file based on our configuration. But if you are a seasoned DBA, you would want to know when these expansions happen. This blog is inspired by a question asked by one of the blog readers. All these questions are a major source of my blog content. Recently I received below email:
We see the database file size getting increase automatically every day. We are not having many active users, or transactions in the database. I read your blogs and understood that this might be due to auto grow setting of the database. Is there any way to find “who” is doing it?
The problem is half solved if we understand the reason. DBA has already nailed it down till auto growth and to help my reader further I provided him below script to find the auto growth events happen on the server. This information is pulled from Default Trace which runs by default on every SQL Server installation.
Here is the script to get the auto grow events which happened of database “SQLAuth”. Please change it as per database name in your environment
DECLARE @current_tracefilename VARCHAR(500);
DECLARE @0_tracefilename VARCHAR(500);
DECLARE @indx INT;
DECLARE @database_name SYSNAME;
SET @database_name = 'SQLAuth'
SELECT @current_tracefilename = path
WHERE is_default = 1;
SET @current_tracefilename = REVERSE(@current_tracefilename);
SELECT @indx = PATINDEX('%\%', @current_tracefilename);
SET @current_tracefilename = REVERSE(@current_tracefilename);
SET @0_tracefilename = LEFT(@current_tracefilename, LEN(@current_tracefilename) - @indx) + '\log.trc';
,(Duration / 1000) AS 'TimeTaken(ms)'
,(IntegerData * 8.0 / 1024) AS 'ChangeInSize MB'
FROM::fn_trace_gettable(@0_tracefilename, DEFAULT) t
LEFT JOIN sys.databases AS d ON (d.NAME = @database_name)
WHERE EventClass >= 92
AND EventClass <= 95
AND ServerName = @@servername
AND DatabaseName = @database_name
AND (d.create_date < EndTime)
ORDER BY t.StartTime DESC;
To test the script, you can created a dummy database, insert some rows so that auto growth is caused. Then check the report.
CREATE DATABASE [SQLAuth]
ALTER DATABASE [SQLAuth] SET RECOVERY FULL
BACKUP DATABASE [SQLAuth] TO DISK = 'NUL'
USE [SQLAuth] GO
CREATE TABLE PinalDave (Pinal INT ,Dave CHAR(8000))
SET NOCOUNT ON
DECLARE @i INT
SET @i = 1
WHILE @i < 10000
INSERT INTO PinalDave
VALUES (1,'Pinal Dave')
SET @i = @i + 1
Here is the output of the script which shows the growth event details.
Last week, I had a very interesting situation where client believed they had an issue their slow running server. They sent me a few initial details about their server and after looking at that I told them that they have a CPU pressure issue. They were very surprised as to how did I know that information in such a little time. I eventually expressed them it is possible to detect CPU pressure with Wait Statistics
Here is the script which can help you to identify if there is a CPU pressure or not with the help of SQL Wait Statistics.
-- Signal Waits for instance
SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%resource waits] FROM sys.dm_os_wait_stats OPTION (RECOMPILE);
Higher the Signal wait stats are not good for the system. A very high value indicates CPU pressure. In my experience, when systems are running smoothly and without any glitch, the Signal wait stat is lower than 20%. Again, this number can be debated. In other words, lower is better and higher is not good for the system.
When my client sent me the result of the above statement, the value of Signal Wait Time was around 75%, I am very confident that it is very high value and leading to SQL Server Performance problems.
Once we get on consulting call, we spent next 2 hours to identify the CPU bottleneck and removing it from the system. The problem which was blocking the CPU to be used efficiently was very simple, once we fixed it, their system started to run extremely fast.
Many a time, DBAs face the following question: can the CPU handle the current transaction? I have seen many DBAs getting confused about whether they should get a CPU with higher clock speed or more number of CPUs when it is time to upgrade the system. As we all know, the CPU (or the Central Processing Unit) is the core of any system. The CPU is responsible for not only SQL Server operations but also all the OS (Operating System) tasks related to the CPU on the server. Let us learn about how to measure CPU Pressure.
It is quite possible that although we are running very few operations on our SQL Server, we still do not obtain the expected results. This is when the SQL Server CPU has to be checked. Again, please note that I am not suggesting that the CPU is the root cause of poor performance in this case. There are ample chances that the Memory, Input/Output or Tempdb can be responsible for performance degradation. However, to find out the “real” culprit, we must run a test; here, we intend to start running tests on the CPU.
Let us understand the simple meaning of CPU pressure. CPU pressure is a state wherein the CPU is fully occupied with currently assigned tasks and there are more tasks in the queue that have not yet started.
We can run several different types of queries to test the CPU. However, my personal favorite is the following one. The resultset will contain as many rows as those being used by the CPU SQL Server.
WHERE scheduler_id < 255;
Above query will give result something like following.
Before we go ahead, let us familiarize ourselves with the terms. First of all, we should understand what a “scheduler” refers to in the SQL Server. Each SQL Server instance behaves and acts like any OS internally. It has schedules and also it synchronizes concurrent tasks by itself without going to OS. Each instance can handle many concurrent requests. However, there is a point at which the instance cannot accept any more requests. SQL Servers have a dynamic management view (DMV), which in fact keeps a log of all the schedule-related tasks and also it provides a good idea about the CPU pressure.
current_tasks_count is the number of counts of the currently running task. However, we should give the utmost consideration to runnable_tasks_count. If this number is higher, it indicates that a large number of queries, which are assigned to the scheduler for processing, are waiting for its turn to run. This gives a clear indication of the CPU pressure. Additionally, count pending_disk_io_count displays the tasks that are yet to be processed in the scheduler. For better processing, this count is expected not to be very high.
When we say that the numbers are high or low, it does not make any sense unless we compare it to a standard or any other known count. Therefore, here, these numbers are compared to the worker counts (current_worker_count). If current_worker_count is 24 and there are 1000 tasks in queue, then this is not a good scenario. Thus, you can always look at the numbers and make your own judgment here.
While writing my previous blogs about SQL Server Configuration Manager, I encountered another error. In this blog, we would learn one of the possible causes of error 0x80010108 – The object invoked has disconnected from its clients.
Here is the screenshot of the error message.
When I searched for the hex code on the internet, it was saying RPC_E_DISCONNECTED and looks like that is the code of the text of the message. When I looked back at the series of action I did, the error was reproducible.
Once you close and reopen SSCM, it should automatically start WMI service and we should be able to see the list of services again.
We simulated the error by stopping WMI Service manually. So, whenever you see such error with SQL Server Configuration Manager, you should figure out why WMI service was stopped. You can start with Event Viewer to know if there are any other interesting events.
13) When was Table Accessed Last By User?
When you run the following query, it will give you essential information about the table seek, table scan, table lookup as well as table update. Looking at the report we can decide which table is the busiest table for your customer and how you can start attacking the busiest table to get maximum performance.
SELECT DB_NAME(ius.[database_id]) AS [Database],
OBJECT_NAME(ius.[object_id]) AS [TableName],
MAX(ius.[last_user_lookup]) AS [last_user_lookup],
MAX(ius.[last_user_scan]) AS [last_user_scan],
MAX(ius.[last_user_seek]) AS [last_user_seek],
MAX(ius.[last_user_update]) AS [last_user_seek]
FROM sys.dm_db_index_usage_stats AS ius
WHERE ius.[database_id] = DB_ID()
--AND ius.[object_id] = OBJECT_ID('YourTableName')
GROUP BY ius.[database_id], ius.[object_id]
While playing with my lab cluster, I ran into a situation. In this blog, we would learn about a situation where failover was not working from one node to another node. Everything used to work fine on Node1. Let us fix the error related to failover clustered.
Errors from System Events logs:
01/04/2018 04:02:16 PM
Cluster resource ‘SQL Server (InstA)’ of type ‘SQL Server’ in clustered role ‘Instance_A’ failed.
01/04/2018 04:02:19 PM
Service Control Manager
The SQL Server (InstA) service entered the running state.
01/04/2018 04:04:04 PM
Service Control Manager
The SQL Server (InstA) service entered the stopped state.
System Events logs pretty much showed generic errors so, I moved on to the cluster logs and isolated to the timeline when I tested the failover. Here is my earlier blog to learn how to generate cluster logs.
Errors from cluster log. Remember that by default time is in UTC time-zone. In Cluster log, ERR and WARN are generally the interesting messages. The above-highlighted error was very interesting. And when I spent more time in the cluster log, I unearthed these below errors:
0000173c.000029f4::2018/04/01-11:32:16.564 ERR [RHS] s_RhsRpcCreateResType: (126)’ because of ‘Error loading
resource DLL sqagtres.dll.’
00000ffc.000027d8::2018/04/01-11:32:16.564 INFO [RCM] result of first load attempt for type SQL Server Agent: 126
0000173c.000029f4::2018/04/01-11:32:16.604 ERR [RHS] s_RhsRpcCreateResType: (126)’ because of ‘Error loading resource DLL sqsrvres.dll.’
00000ffc.000027d8::2018/04/01-11:32:16.604 INFO [RCM] result of first load attempt for type SQL Server: 126
Both the errors found above are talking about a possible faulty resource DLLs for SQL Server resources. We can see number 126 which is the error code. As we can see below, error number 126 = The specified module could not be found.
Since the error above talked about the DLL not being loaded, I tried to register the DLL in question and got the below error:
“SQSRVRES.dll” failed to load.
Make sure the binary is stored at the specified path or debug it to check for problems with the binary or dependent .DLL files.
The specified module could not be found.
This is the same what we saw in cluster log earlier.
By looking at various errors so far, I was concerned if the SQL cluster related DLLs are actually present. So, I tried to verify the same in the path C:\windows\system32 folder. Surprisingly, I was not able to find sqsrvres.dll and sqagtres.dll. Instead, both were renamed as,
I renamed the DLLs to the required name sqsrvres.dll and sqagtres.dll. After that, we were able to start SQL resource from Failover Cluster Manager. I also rebooted the affected node so that all the resources come online on their own so that all the DLLs loading happening automatically.
One of my clients consulted me for the best and easy method for migrating the Cluster Storage from old to newly attached disks. They wanted to move/reconfigure disks mapped to SQL Server, MSDTC, Quorum.
At present, they were using SQL Server 2014 as a mission-critical database server in a cluster environment. Storage being used was IBM DS 8300. They want to migrate the database from current storage i.e. IBM DS 8300 to new storage IBM DS 8870.
Current Disk Assignment
Newly Added Storage
Disk Q: Quorum Disk
Disk X: MSDTC Disk
Disk R: SQL Data disk
Disk S: SQL Log disk
Disk T: SQL TempDB disk
My search on the internet combined with my experience, I have come up with below plan for each component.
SQL Server databases can be migrated on to the new storage by many ways:
The above process is tedious, and your problem shoots up when you want to migrate the storage hosting the System DBs as they need special care while moving to new storage/location. The overall steps needed to accomplish the above task also becomes more. Below are the steps outlined to ease out the above challenges and can also be done in a very simple way.
While starting the SQL Server resource in Failover cluster manager, you may face this error,
initerrlog: Could not open error log file ‘Drive:\pathname\MSSQL11.MSSQLSERVER\MSSQL\Log\ERRORLOG’. Operating system error = 5(Access is denied.). Refer below to solve this issue.
Change Windows Cluster Quorum Disk
Change MSDTC Disk:
The easiest way I have known is to just re-configure MSDTC using the new disk.
These steps helped my client migrate with very less downtime. The only downtime for SQL was to copy/paste the file from old to new storage.
Almost all the time, whenever there is a wizard, it’s a human habit to go with the defaults and finally click finish. Once of my client sent below email to me. In this blog post we are going to learn about Added New Node in Windows Cluster and AlwaysOn Availability Databases Stopped Working.
We are trying to add new node to the AlwaysOn Availability Group and for that we must add new node to Windows cluster. Before doing this in production, we are trying to our test environment and we ran into issues. We noticed that as soon as node is added in windows, our databases which were part of an availability group went to not synchronizing state. Later I noticed that local disks were added to the cluster under “available storage”.
Have you seen this issue? What is wrong with our setup?
I asked for any error in event log and they shared below.
Log Name: System
Event ID: 1069
Task Category: Resource Control Manager
Description: Cluster resource ‘Cluster Disk 2’ of type ‘Physical Disk’ in clustered role ‘Available Storage’ failed. The error code was ‘0x1’ (‘Incorrect function.’)
I told them that they must have followed the wizard and must have forgotten to “uncheck” the highlighted checkbox.
To work around this problem, we must remove the disk resource from Failover Cluster Manager. Once done, we need to bring these drives online in Disk Management once they are removed from Failover Cluster Manager.