AMBITION: SQL HOMEOSTASIS

Thursday, March 30, 2023

Truncating very large transaction log in full recovery mode

While I always configure transaction log backups for every database in non-simple recovery model, and to compliment it I also have monitoring in place if transaction log doesn't get backed up, which does happen on rare occasions. For instance in my current organization where all applications/databases are third party apps with no in-house developed apps, once in a while an app/user may run a huge transaction, it could be that they decided to purge almost all historical data from a 500TB database and delete 450TB data from it all in a single transaction, leaving a huge transaction log and not enough available disk space on the backup drive (separate issue/topic).

Per best practice and for almost any conceivable argument, its preferred that you take a transaction log backup and then shrink the log file/s. But in an emergency situation you could use the following SQL script to quickly transaction the transaction log. I believe the script is self-explanatory but I missed anything or have questions, please feel free to post a comment here and I will go my best to get to back you.

/* CAUTION:
USE THIS ONLY IN EMERGENCY WHERE THE TRANSACTION LOG HAS NOT BEEN BACKED

IN A VERY LONG TIME AND HAS GROWN TO BE TOO BIG


The script does the following in order to shrink the transaction log file:
- Ensure the db is not running in Simple recovery mode
- Store the database recovery mode into a variable
- Change recovery mode to SIMPLE
- Issues checkpoint command
- Wait 2 seconds
- Shrink the log file
- Wait 2 seconds
- Change recovery mode back to original value

IMPORTANT: This script does not take a full or differential database backup so its crucial that you take a full or differential backup to re-set the transaction log chain. Otherwise any attempt to take a transaction log backup will fail, creating more problems.

*/

declare @db nvarchar(100) = 'MyDB'               -- Specify name of the database here
declare @target_log_size_mb nvarchar(20) = 64    -- if DB has multiple log files, each log file will be of this size
declare @recovery_mode nvarchar(100)
declare @sql nvarchar(4000)

select @recovery_mode = recovery_model_desc from sys.databases where name = @db

if @recovery_mode='SIMPLE'
   raiserror('Error: Use this script only on database in Full recovery mode.', 16, 1)
else if databasepropertyex(@db, 'Updateability')='READ_ONLY'
   raiserror('Error: Database is not updatable. Please ensure that the database is not read only, mirrored or otherwise inaccessible.', 16, 1)
else
BEGIN
		set @sql = '
		USE [master]
		ALTER DATABASE [' + @db + '] SET RECOVERY SIMPLE WITH NO_WAIT
		USE [' + @db + ']
		WAITFOR DELAY ''00:00:02''
		checkpoint
		WAITFOR DELAY ''00:00:02''

		-- open a cursor so we can iterate through and shrink all log files
		declare @log_file_name sysname
                declare c1 cursor for
		select name from sys.database_files where type=1
		open c1
		fetch c1 into @log_file_name
		while @@FETCH_STATUS = 0
		begin
			   print @log_file_name
			   DBCC SHRINKFILE (@log_file_name, ' + @target_log_size_mb +')
			   fetch c1 into @log_file_name
		end
		close c1
		deallocate c1

		WAITFOR DELAY ''00:00:02''
		USE [master]
		ALTER DATABASE [' + @db + '] SET RECOVERY ' + @recovery_mode + ' WITH NO_WAIT
		'
		print @sql
		exec sp_executesql @sql
END

Monday, January 6, 2020

Getting most recent transaction log backup time irrespective of where the transaction log backup is performed in always on availability groups

I should mention outright that this post applies to SQL Server version 2016 and up.

Over the years I have relied on the backup history tables in msdb to check if backups are performed regularly and/or if they are falling behind the SLAs. Of course there are other, maybe better ways to monitor your backups too.

But I don't take chances with database backups, not only for the DR purposes but also for the database availability reasons as well. If the transaction log back ups are getting skipped due to some oversight or failing for some reason, it may fill up the disks and lead to outage with not just that database but for other databases sharing the same disks.

So I have a script that checks when was the last successful transaction log backup performed and if it exceeds a threshold, for example more than 24 hours, I get an automatic email alert.

Things got little more interesting in high availability group clusters, SQL Server allows the log backup to be performed on any of the replicas.

Here is summary of what types of backups are allowed on what type of replicas.

Backup Type	Primary Replica	Secondary Replica (Synchronous and Asynchronous)
Full	Yes	Yes but with Copy_Only option)
Log	Yes	Yes (we cannot take log backup with COPY_ONLY on secondary replica)
Differential	Yes	No

Backup events are only records in the replica where the backup is performed and those data do not get automatically synced across all replicas.

So it makes it bit more complicated to check when was the last successful backup.

Now if you happen to always run your backups on a particular replica then you are fine relying on the backup history tables for the most of the times. Even then eventually you will be faced with some very unpleasant surprises (are there any other kinds?).

So to cover all the bases, I needed to find another way. Luckily Microsoft introduced a new system function dm_db_log_stats in SQL Server 2016 and onward that has couple fields that get updated regardless its a primary replica or secondary and among them is log_backup_time field that indicates time of last successful backup. And based on that here is the query that I am using to monitor transaction log backups.

SELECT

    D.NAME    DATABASE_NAME,
    AG.NAME   AG_NAME,
    DBL.LOG_BACKUP_TIME,
    DATEDIFF(MINUTE, DBL.LOG_BACKUP_TIME, GETDATE()) MinutesSinceLastLogBackup,
    HDRS.IS_PRIMARY_REPLICA,
    DBL.RECOVERY_MODEL,
    DBL.LOG_SINCE_LAST_LOG_BACKUP_MB,
    D.STATE_DESC,
    D.IS_READ_ONLY,
    D.LOG_REUSE_WAIT_DESC,
    DATABASEPROPERTYEX(D.NAME, 'Updateability') DATABASE_MODE

FROM
    SYS.DATABASES D
    LEFT JOIN SYS.DM_HADR_DATABASE_REPLICA_STATES HDRS

         ON HDRS.GROUP_DATABASE_ID = D.GROUP_DATABASE_ID
            AND HDRS.REPLICA_ID = D.REPLICA_ID
    LEFT JOIN SYS.AVAILABILITY_GROUPS AG ON AG.GROUP_ID = HDRS.GROUP_ID 
    OUTER APPLY SYS.DM_DB_LOG_STATS ( D.DATABASE_ID )    DBL

WHERE  1 = 1
   AND DBL.RECOVERY_MODEL != 'SIMPLE'
--   AND DATEDIFF(MINUTE, DBL.LOG_BACKUP_TIME, GETDATE()) > 60*24

ORDER BY  DATABASE_NAME

Monday, November 18, 2019

How to get the Active Directory user password policy values

Recently I have had to troubleshoot quite a bit of SQL login issues and often times the issue was with the users active directory user account.

I was aware the my organization has security policies that includes among other things, active directory password policy. Of course there is also SQL Server user security policy, which I know by heart.

To better support my users, I thought it would be a good idea to familiarize myself with the active directory password policy that is in effect. Now, there is a document that outlines the policy but, I want to get what is actually implemented.

Here, I am using powershell to get the password policy values.

You will need to have the ActiveDirectory PowerShell module installed for the following cmdlets to work.

Get-ADUserResultantPasswordPolicy -Identity aduser1

That returned nothing in my case. That most likely implies that the password policy is not assigned per user basis.

So now I am going to check what is the default password policy in the domain.

Get-ADDefaultDomainPasswordPolicy

Edited Results:

ComplexityEnabled           : True
DistinguishedName           : DC=internal,DC=external,DC=org
LockoutDuration             : 00:30:00
LockoutObservationWindow    : 00:30:00
LockoutThreshold            : 6
MaxPasswordAge              : 90.00:00:00
MinPasswordAge              : 1.00:00:00
MinPasswordLength           : 8
objectClass                 : {domainDNS}
objectGuid                  : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
PasswordHistoryCount        : 8
ReversibleEncryptionEnabled : False

if you are curious what all these values mean, please see the below MS document:

Set-ADDefaultDomainPasswordPolicy

How do you use PowerShell to check if an active directory user locked out, disabled etc.?

If your organization uses a password policy (there are very good odds these days that it does) and, especially stricter password requirement for administrative users, your might have experienced instances where yours or your users Active Directory user might be locked out.

How do you check if that is the case? Well, for one thing the Windows will tell you so when you try to login and/or failed login attempts are logged in to sql log, event logs etc. What if user does not logout or have more than one user account, one for regular use and one for administrative tasks? There maybe other scenarios where you have a need to check status of a user account in the Active Directory.

I don't have admin privileges and presumably you don't either. However, I do have read permission on the AD so I could have used Active Directory Users and Groups snap-in.

But, here I am going to show the powershell way.

You will need to have the ActiveDirectory PowerShell module installed for the following cmdlets to work.

# Is account disabled?

get-aduser aduser1 -Properties enabled | ft Enabled

Enabled
-------
   True

# Is account locked out?

get-aduser aduser1 -Properties LockedOut | ft LockedOut

LockedOut
---------
    False

# When does the password expire?

Get-ADUser aduser1 -properties msDS-UserPasswordExpiryTimeComputed | select @{N="PasswordExpiryDate";E={[DateTime]::FromFileTime($_."msDS-UserPasswordExpiryTimeComputed")}}


PasswordExpiryDate  
------------------  
2/13/2020 2:58:26 PM

# Finally, view all properties for a user account

get-aduser aduser1 -Properties *

Wednesday, November 13, 2019

Event Notifications Example in SQL Server

Event notifications are kinda like a trigger in the sense that they respond to specific event, specifically in response to DDL statements and SQL Trace events.

The major difference between the triggers and event notifications is that the triggers are fired in synchronous mode to execute a specific code within the same session and transaction. Whereas, the event notification do not execute any code, it only sends information in asynchronous mode which then can be logged and acted upon later on.

Here is a snippet from the Microsoft Documentation that explains the differences:

Event Notifications vs. Triggers

The following table compares and contrasts triggers and event notifications.

Triggers	Event Notifications
DML triggers respond to data manipulation language (DML) events. DDL triggers respond to data definition language (DDL) events.	Event notifications respond to DDL events and a subset of SQL trace events.
Triggers can run Transact-SQL or common language runtime (CLR) managed code.	Event notifications do not run code. Instead, they send xml messages to a Service Broker service.
Triggers are processed synchronously, within the scope of the transactions that cause them to fire.	Event notifications may be processed asynchronously and do not run in the scope of the transactions that cause them to fire.
The consumer of a trigger is tightly coupled with the event that causes it to fire.	The consumer of an event notification is decoupled from the event that causes it to fire.
Triggers must be processed on the local server.	Event notifications can be processed on a remote server.
Triggers can be rolled back.	Event notifications cannot be rolled back.
DML trigger names are schema-scoped. DDL trigger names are database-scoped or server-scoped.	Event notification names are scoped by the server or database. Event notifications on a QUEUE_ACTIVATION event are scoped to a specific queue.
DML triggers are owned by the same owner as the tables on which they are applied.	The owner of an event notification on a queue may have a different owner than the object on which it is applied.
Triggers support the EXECUTE AS clause.	Event notifications do not support the EXECUTE AS clause.
DDL trigger event information can be captured using the EVENTDATA function, which returns an xml data type.	Event notifications send xml event information to a Service Broker service. The information is formatted to the same schema as that of the EVENTDATA function.
Metadata about triggers is found in the sys.triggers and sys.server_triggerscatalog views.	Metadata about event notifications is found in the sys.event_notifications and sys.server_event_notifications catalog views.

In the following sample SQL script, I am creating an Extended Event to capture ALTER TABLE events in a database.


-- Create a brand new database for the testing purpose
use master
go
if db_id('temp_event_notification_test_929368242990-321836') is not null
  drop database [temp_event_notification_test_929368242990-321836]
go
CREATE DATABASE [temp_event_notification_test_929368242990-321836]
GO


-- Enable the service broker if its not already
if not exists (select * from sys.databases where name = '[temp_event_notification_test_929368242990-321836]' and is_broker_enabled = 1)
ALTER DATABASE [temp_event_notification_test_929368242990-321836] SET ENABLE_BROKER; 
go

-- set the trustworth property ON
if not exists (select * from sys.databases where name = '[temp_event_notification_test_929368242990-321836]' and is_trustworthy_on = 1)
ALTER DATABASE [temp_event_notification_test_929368242990-321836] SET TRUSTWORTHY ON;
go

-- check if there is already a service broker end point running
if not exists (select * from sys.service_broker_endpoints where type_desc = 'SERVICE_BROKER' and state_desc = 'STARTED' )
BEGIN
    -- check if there is a SB endpoint with same name
    if not exists (select * from sys.service_broker_endpoints where NAME = 'en_service_broker_929368242990-321836')
 BEGIN
    -- check to make sure the tcp port is not already in use
    if not exists (SELECT * FROM SYS.tcp_endpoints where port = 5122)
   CREATE ENDPOINT [en_service_broker_929368242990-321836]
   STATE = STARTED
   AS TCP (  LISTENER_PORT = 5122)
   FOR SERVICE_BROKER (AUTHENTICATION = WINDOWS  );
    else
   raiserror('Error: An end point cannot be created. Please check if there is already one with same port.', 16,1)
 END
 else
    raiserror('Error: An end point cannot be created. Please check if there is already one with same name.', 16,1)
END
GO

USE [temp_event_notification_test_929368242990-321836]

go
CREATE QUEUE [ent_929368242990-321836] ;  
GO  
CREATE SERVICE [ens_929368242990-321836]  
ON QUEUE [ent_929368242990-321836]  
(  
[http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]  
);  
GO

CREATE ROUTE [enr_929368242990-321836]  
WITH SERVICE_NAME = 'ens_929368242990-321836',  
ADDRESS = 'LOCAL';  
GO  

CREATE EVENT NOTIFICATION [enen_929368242990-321836]  
ON DATABASE  
FOR ALTER_TABLE  
TO SERVICE 'ens_929368242990-321836',  'current database'

-- Test
-- Generate the events
if object_id('entt_929368242990-321836') is not null
drop table [entt_929368242990-321836]
go
create table [entt_929368242990-321836] (i int)
go
alter table [entt_929368242990-321836] add b int
go

-- verify/display that the event notification was captured

SELECT TOP (1000) *, casted_message_body =

CASE message_type_name WHEN 'X' 
  THEN CAST(message_body AS NVARCHAR(MAX)) 
  ELSE message_body 
END 
FROM [temp_event_notification_test_929368242990-321836].[dbo].[ent_929368242990-321836] WITH(NOLOCK)

if @@ROWCOUNT = 0
   RAISERROR('Error: Something is not right. Event notification was not captured.', 16,1)
else 
   SELECT 'Success!' Msg
go

-- clear the records from the queue

-- RECEIVE display the event as well as removes it from the queue
RECEIVE * FROM [ent_929368242990-321836] go

-- verify that the queue is now empty
SELECT TOP (1000) *, casted_message_body =  CASE message_type_name WHEN 'X'    THEN CAST(message_body AS NVARCHAR(MAX))    ELSE message_body  END  FROM [temp_event_notification_test_929368242990-321836].[dbo].[ent_929368242990-321836] WITH(NOLOCK)

-- Since I am only testing, I am using the following code to clean up afterwards
/* CLEAN UPUSE [temp_event_notification_test_929368242990-321836]goif exists (SELECT * FROM sys.event_notifications where name = '[enen_929368242990-321836]' and parent_class_desc = 'DATABASE')DROP EVENT NOTIFICATION [enen_929368242990-321836]  ON DATABASE;  go if exists (select  * from sys.routes where name = '[enr_929368242990-321836]' and address = 'LOCAL')DROP ROUTE [enr_929368242990-321836]  if exists (SELECT * FROM sys.services where name = 'ens_929368242990-321836')DROP SERVICE [ens_929368242990-321836]GOif exists (SELECT * FROM sys.service_queues where name = 'ent_929368242990-321836' and schema_id = 1)DROP QUEUE [dbo].[ent_929368242990-321836]GOuse mastergoif db_id('temp_event_notification_test_929368242990-321836') is not null  drop database [temp_event_notification_test_929368242990-321836]go */

..............

Thursday, November 7, 2019

What SQL Server Agent Alerts Do I have setup?

I am in a situation where I have to incorporate SQL Server Agent alerts in my monitoring and alerting strategy.

I needed a query (DMV) to get details on what alerts are setup on each server. And the result is the following query that I will run as a multi-server query.

SELECT a.[id]                        [alert_id], 
       a.[name]                      [alert_name], 
       a.[enabled]                   [is_alert_enabled], 
       o.[enabled]                   [is_operator_enabled], 
       o.[email_address]             [email_address], 
       o.[pager_address]             [pager_address], 
       o.[netsend_address]           [netsend_address],
       j.[name]                      [job_name], 
       a.[event_source]              [alert_event_source], 
       a.[event_category_id]         [alert_event_category_id], 
       sc.[name]                     [alert_category_name], 
       CASE sc.[category_class] 
         WHEN 1 THEN 'JOB' 
         WHEN 2 THEN 'ALERT' 
         WHEN 3 THEN 'OPERATOR' 
         ELSE '0' 
       END                           [alert_class_name], 
       sm.[description]              [alert_message_description],
       a.[event_id]                  [alert_event_id], 
       a.[message_id]                [alert_message_id], 
       a.[severity]                  [alert_severity], 
       a.[enabled]                   [alert_enabled], 
       a.[delay_between_responses]   [alert_delay_between_responses], 
       a.[last_occurrence_date]      [alert_last_occurrence_date], 
       a.[last_occurrence_time]      [alert_last_occurrence_time], 
       a.[last_response_date]        [alert_last_response_date], 
       a.[last_response_time]        [alert_last_response_time], 
       a.[notification_message]      [alert_notification_message], 
       a.[include_event_description] [alert_include_event_description], 
       a.[database_name]             [alert_database_name], 
       a.[event_description_keyword] [alert_event_description_keyword], 
       a.[occurrence_count]          [alert_occurrence_count], 
       a.[count_reset_date]          [alert_count_reset_date], 
       a.[count_reset_time]          [alert_count_reset_time], 
       a.[job_id]                    [alert_job_id], 
       a.[has_notification]          [alert_has_notification], 
       a.[flags]                     [alert_flags], 
       a.[performance_condition]     [alert_performance_condition], 
       a.[category_id]               [alert_category_id] 
FROM   msdb.dbo.sysalerts a 
       LEFT OUTER JOIN msdb.dbo.syscategories sc ON a.category_id = sc.category_id 
       LEFT OUTER JOIN msdb.dbo.sysnotifications sn ON ( a.id = sn.alert_id ) 
       LEFT OUTER JOIN msdb.dbo.sysoperators o ON ( o.id = sn.operator_id ) 
       LEFT OUTER JOIN msdb.dbo.sysjobs j ON j.job_id = a.job_id 
       LEFT OUTER JOIN msdb.dbo.sysmessages sm ON sm.error = a.message_id
   and sm.msglangid = SERVERPROPERTY('LCID')
ORDER  BY 1

And here is the sample result:

Finding out the most cached database in the buffer cache - The DMV Way

While there are many ways and criteria to find out what database is the most used, under optimized, trouble maker etc....., here is one more to find out the database/s using the most buffer cache.

SELECT 
CASE database_id   
        WHEN 32767 THEN 'ResourceDb'   
        ELSE db_name(database_id)   
        END AS database_name,
COUNT(*)AS cached_pages_count,
COUNT(*) / 128 / 1024 AS cache_size_gb
    
FROM sys.dm_os_buffer_descriptors  
GROUP BY DB_NAME(database_id) ,database_id  
ORDER BY cached_pages_count DESC;

Here is the results I got on one of the production SQL Servers.

Lucky for me, it turned out to be an extreme case. Now I knew where I should focus optimization efforts or even if that database belongs with rest of the databases on that same server.

I have tested this query for SQL 2008 (SP3) and up.

Hope you find this useful.