:::: MENU ::::

Blog

  • Sep 19 / 2017
  • 0
dbDigger, High Availability, Replication, SQL Server tools, SQL Server Training

SQL Server Replication refresher

During phase of training material preparation, a brief overview of replication architecture is with me. In coming lines i am going to put it in this post as a good starting point for replication learners.

Replication

Replication is a way of distributing data automatically from a source database to one or more recipient databases. It is built to sync the selected objects of a database not whole DB.  It encompasses set of objects in a database whose data is required to keep sync at subscriber databases. Replication architecture is not part of the SQL Server database engine, but a separate feature or application. It works through agents, triggers and scheduled jobs specific for this purpose. Replication can have uses in high availability systems or in a distributed system and can have different toplogies.

There are 3 main types of replication

  • Transactional Replication (one or two way)
  • Merge Replication
  • Snapshot Replication

Transactional Replication

This type of SQL Server replication is used for near real time or short delayed synchronization from publisher and subscriber. Initially a snapshot process synchronizes the subscribers with the publisher. All committed transactions on the publisher are then propagated to the subscribers in sequence. It may be one way transactional replication or two way.

In one way replication, modifications from publisher are synced on subscriber but no modification from subscriber is propagated back to publisher. While in case of peer to peer or two way replication, changes are propagated both ways and data remains sync on both sides for selected articles.

There are two options while propagating changes back from subscriber to publisher.

  • Immediate updating
  • Queued updating

Both methods can be used at the same time to maximize the availability and efficiency.

Immediate updating

In case of immediate updating changes made at the subscriber are synced to the publisher using a two-phase commit process managed by Microsoft Distributed Transaction Coordinator (MS DTC). Changes made at the subscriber must be made to both systems or are rolled back. This method is suitable for reliable, high-speed connection between the Publisher and Subscriber.

Queued updating

This method is suitable for less reliable connections. Writes updates at the subscriber to a queue. Publisher reads the queue and applies changes. This flow might result in conflicts due to latency.

 

Peer to Peer replication

Peer to peer replication was introduced in SQL Server 2005 and only supported in Enterprise edition of SQL Server version. It resembles in some features features with two way transactional replication. Main difference is that peer to peer replication involves more than just single publisher and associated subscriber nodes. It can span on multiple nodes and each will be kept in Sync. Also peer to peer replication is better while propagating the DDL of articles and locking mechanism.

Merge Replication

Merge replication is relatively complex type of replication. It does not use transactions but a set of defined rules used to resolve conflicts when data updated at multiple sites is synchronized. It tracks changes through triggers and metadata tables provides the means to implement part of a high-availability system. This also serves well the original purpose of serving mobile and disconnected users. It is designed for cases where the publishers are not in constant communication with the subscribers. After the initial snapshot synchronization, subsequent changes are tracked locally and the databases are merged when in contact using series of rules to resolve conflicts.

 

Snapshot Replication

Data changes are not tracked for snapshot replication. Each time a snapshot is applied, it completely overwrites the existing data. Snapshot replication process provides the initial synchronization for transactional and merge publications. In several cases, this initial synchronization is all that is necessary. Data is synced at defined intervals as opposed to constant transactional sync of other types.

 

Distribution Setup

Distribution is important setup in replication architecture. It has following components and terms.

Distribution server

Distribution server is a role where distribution setup (agent, jobs, DB) resides. Distribution server may be same as publisher or a dedicated server. For big and busy replication setups separate distributor server is recommended. We can mark a server as distributor before going to setup replication publisher and subscriber.

Distribution database

Distribution is system database for replication setup and is created automatically on server with distributor role. We can customize the DB name, files name, location and file size however. For transactional replication, the distribution database tracks changes made to publications. For merge replication it stores only synchronization history.

Distribution agent

Distribution agent is setup as a separate module and works in form of scheduled job. Distribution agent is a process that moves transactions and snapshots held in the distribution database to subscribers. Works in form of scheduled job under replication jobs category.

 

Replication components

There are some other terms and components that are always involved in replication setup.

Publisher

Publisher is a server that makes data available for other servers, i.e. it is the server that makes data available for replication.

Publication

Publication can be a selection of tables, indexed view and stored procedures. Each publication can be replicated as a snapshot publication, a transactional publication or a merge publication. It is important to choose the publication articles on a valid base like size, relation of tables. There may be multiple publications in a single publisher.

Article

An object included in a publication is called an article. Articles are combined on some basis to form a publication. Article can be an entire table, select rows from a table, specified columns from a table indexed view or a stored procedure. An article is a unit in publication.

Subscriber

Subscriber is the server that hosts the DB where data from publisher is being synced. There may be multiple subscribers against a single publisher. Subscriber may be on same server as that of publisher. Also there may be multiple subscriber databases on a single subscriber server.

Subscription

Subscription relates to publication and goes to subscriber database. There may be multiple subscriptions against a single publisher and each will go to separate subscriber database. There are two types of subscriptions.

Pull Subscription

With pull subscription the subscriber asks the publisher for periodic updates. With pull subscription each subscriber can specify the best time for the update. Pull subscription is generally considered the better form of replication when a large number of subscribers are involved, or where the load on the LAN needs to be distributed.

Push Subscription

With push subscription the publisher pushes any changes made to the data out to the subscribers. Typically push subscription is used where changes need to be replicated to the subscribers shortly after the change occurs, i.e. where the replication of changes is required in near real time.

Replication Agents

As we have discussed several components of replication in above sections some agents were discussed as well. It looks suitable to list the agents so that they may give a clear picture of setup

  • Snapshot agent
  • Log Reader Agent
  • Distribution Agent
  • Merge Agent
  • Que Reader Agent

Replication effect on log file

It is important to know the relation and effects of replication on database log file. If replication falls in latency then log begins to accumulate in log file unless it is send to distributor. This may ultimately fill the log file and slow down the system in some cases. Situation is same when database is with any recovery model FULL or even SIMPLE.

Replication topologies

Topology is how the replication setup is designed and components are arranged. Following are different disgrams to give view of different toplogies.

SQL Server Replication Topology Central Distributor

 

SQL Server Replication Topology Central Publisher with Remote Distributor

 

SQL Server Replication Topology Central Publisher

 

SQL Server Replication Topology Central Subscriber

 

SQL Server Replication Topology Publishing Subscriber

  • Aug 31 / 2017
  • 0
T-SQL Scripts, Table Partitioning

Verify next month partitions in all databases

I have databases with partitioned tables in some databases. I occasionally had to verify that next month partition is there in each database. It required to check in each database one by one. To improve the process i created a script that parses the databases with partitioned tables one by one and checks either partition for next month data is there or not.

Specifications

My created script is specific to scenario and following are the specifications

  1. All partitioned tables in a DB are created on a single partition scheme. Hence there is only one partition scheme and partition function in a DB.
  2. Partitions are created with one month span. i.e. Each partition has one month data.
  3. Partitions are created with right boundary (less than)
  4.  Script checks either partition for next months data is added or not

Script



SET NOCOUNT ON
GO

Print 'Report for partitioning status on '+@@servername
-- Create tempDB to hold DB names with partitioned tables
IF OBJECT_ID(N'tempdb.dbo.#DBs') IS NOT NULL
	DROP TABLE #DBs

CREATE TABLE #DBs (DBName VARCHAR(50))

-- Insert DB names with partitioned tables
INSERT INTO #DBs (DBName)
EXEC sp_msforeachdb 
'use ?; if exists (select top 1 * from ?.[sys].[partitions] where partition_number >1) select db_name()'
GO

-- Declare variables
DECLARE @DB_Name VARCHAR(50)
DECLARE @Command NVARCHAR(1000)
-- Declare cursor for checking DBs one by one
DECLARE database_cursor CURSOR
FOR

SELECT DBname
FROM #DBs

OPEN database_cursor

FETCH NEXT
FROM database_cursor
INTO @DB_Name

WHILE @@FETCH_STATUS = 0
BEGIN
	-- Prepare sql for last partition check
	SELECT @Command = 'use ' + @DB_Name + ';
	 if (SELECT convert(bigint, max(r.value) ) 
	 FROM [sys].[partition_range_values] r inner join [sys].[partition_functions] f
  on r.function_id = f.function_id) 
  < (select CONVERT([bigint],replace
  (replace(CONVERT([varchar](13),DATEADD(DAY, 1,(DATEADD(MONTH,1 , EOMONTH(getdate())))),(121)),''-'',''''),'' '','''')+''00''))
  Print ''Partitioning issue in DB ' + @DB_Name + ''';' + 'else print ''Partitioning Ok in DB ' + @DB_Name + ''' ;'

	--print @Command;
	EXEC sp_executesql @Command;

	FETCH NEXT
	FROM database_cursor
	INTO @DB_Name
END

CLOSE database_cursor

DEALLOCATE database_cursor

GO

Output

Output will be in following format.

Verify next month partitions in all databases

 

  • Aug 29 / 2017
  • 0
Data Modeling and Database Design, Table Partitioning

ALTER TABLE SWITCH statement failed because column ‘%’ does not have the same ANSI trimming semantics in tables ‘%’ and ‘%’.

While trying to switch a partition from one table to another i got following error message

ALTER TABLE SWITCH statement failed because column ‘%’ does not have the same ANSI trimming semantics in tables ‘%’ and ‘%’.

It is related to difference of ANSI Padding setting between any column of both the tables involved in partition switch. Surprisingly i did not find the match of this setting as a partition switch prerequisite in BOL.

ANSI_Padding

ANSI Padding relates to storage of char, varchar, binary, and varbinary data types. It is by default ON in SQL Server unless explicitly set OFF for a connection. Best practice is to always set it ON. Data types like nchar, nvarchar, ntext, text, image, varbinary(max), varchar(max), and nvarchar(max) have always ANSI_Padding ON and cannot be changed. So we will be dealing with char, varchar, binary, and varbinary data typesto manage this setting.

Problem while switching partition

Coming back to our partition switch issue, we are required to eliminate the ANSI_Padding difference between both tables to perform the partition switch. Let us create two tables, one with ANSI_Padding ON and other with OFF.

-- Select database
USE DBDigs
GO

-- Set ANSI_Padding for first table (tbl_APOff)
SET ANSI_PADDING OFF
GO

-- Create first table (tbl_APOff)
IF EXISTS (
SELECT NAME
FROM sys.objects
WHERE type = 'U'
AND NAME = 'tbl_APOff'
)
DROP TABLE tbl_APOff
GO

CREATE TABLE tbl_APOff
(id INT identity(1, 1), firstName VARCHAR(50), lastname VARCHAR(50))
GO


-- Set ANSI_Padding for second table (tbl_APOn)
SET ANSI_PADDING ON
GO

IF EXISTS (
SELECT NAME
FROM sys.objects
WHERE type = 'U'
AND NAME = 'tbl_APOn'
)
DROP TABLE tbl_APOn
GO

-- Create second table (tbl_APOn)
CREATE TABLE tbl_APOn
(id INT identity(1, 1), firstName VARCHAR(50), lastname VARCHAR(50))
GO

Above script will create tbl_APOff with ANSI_Padding OFF and tbl_APOn with ANSI_Padding ON. Both tables are exact copies of each other except the difference of ANSI_Padding setting that can be verified with following script

-- Select database
USE DBDigs
GO

-- Check ANSI_PAdding for columns in both tables
SELECT s1.NAME AS TableAPOn, s1.Is_ANSI_Padded AS APOnSetting,
s2.NAME AS TableAPOff, s2.Is_ANSI_Padded AS APOffSetting
FROM sys.columns s1
INNER JOIN (
SELECT NAME, Is_ANSI_Padded
FROM sys.columns
WHERE object_Name(object_ID) IN ('tbl_APOff')
) AS s2 ON s1.NAME = s2.NAME
WHERE object_Name(object_ID) IN ('tbl_APOn')
ORDER BY APOnSetting
GO

Columns does not have the same ANSI trimming semantics

At this stage we have idea of issue and a script to detect and analyze the issue.

Fix

To fix the ANSI_Padding difference issue we have two primary  options.

  • ALTER the ANSI_Padding  setting of columns in one table to match other
  • Recreate one of these tables with ANSI_Padding setting matching to other one

While applying any of above approach we should prefer to make ANSI_Padding ON so that best practice can be implemented.

ALTER the ANSI_Padding  setting

Let us change the ANSI_Padding setting for both columns in tbl_APOff. This can be done with following ALTER COLUMN commands.

 
USE DBDigs
GO

SET ANSI_PADDING ON 
GO 

ALTER TABLE tbl_APOFF 
ALTER COLUMN firstName VARCHAR (50) NULL; 
GO 

ALTER TABLE tbl_APOFF 
ALTER COLUMN  lastname VARCHAR(50); 
GO 

Now we can match and verify the ANSI_Padding setting in both tables by re-executing the previous script. And result confirms the fix.

Match the ANSI_Padding setting of columns in two tables

Issue is fixed and we can switch the partitions

Recreate table with modified ANSI_Padding

As a second option you can drop and re-create the table and do not forget to mention

 
SET ANSI_PADDING ON 
GO 

at start of script.

  • Aug 25 / 2017
  • 0
Columnstore Indexes, T-SQL Scripts, T-SQL Tips and Tricks

Row count for all tables with clustered columnstore index in a database

Consider a scenario that you require to get number of rows for tables in a database with clustered column store index (CCI). It would require to access three tables to get the required info. Following script will get the row count for all tables with CCI

-- Get row count for all tables with CCI in a DB
SELECT schema_name(o.schema_id)+'.'+o.name AS CCITables,
sum(si.rowcnt) AS NoOfRows
FROM sys.objects o
INNER JOIN sysindexes si ON o.object_id = si.id
INNER JOIN sys.indexes i ON o.object_id = i.object_id
AND i.type = 5
GROUP BY schema_name(o.schema_id)+'.'+o.name
ORDER BY NoOfRows DESC
GO

  • Aug 21 / 2017
  • 0
Columnstore Indexes, dbDigger

Introduction to SQL Server Columnstore Indexes

Columnstore indexes are introduced in SQL Server 2012. These are used for large tables where primarily READ operations are performed. Suitable environment is DWH fact tables. Storing data in this structure may help to get 10 X read performance and 10 X compression as compared to traditional row structured storage of data.

In this technology data is physically stored, retrieved and managed in a columnar data format. There are two types of columnstore indexes

  • Clustered columnstore indexes (CCI)
  • Non clustered columnstore indexes

Clustered columnstore indexes (CCI)

are used for large tables in DWH where considerable size difference may be acheived by compression. And READ operations are intensive.

Non clustered columnstore indexes

are used for environment where OLTP load is also involved with analysis queries.

Creating columnstore index on existing table

While creating a clustered columnstore index (CCI) on a table there may be some issue due to data type or column type. For example we may not create CCI on a table with data type nvarchar(max) in it. Or with a computed column (persisted or non persisted) in that table.

Combining columnstore and rowstore indexes

Starting from SQL Server 2016 we can have non clustered columnstore indexes on a table with rowstore table. And we can also have non clustered rowstore indexes on a columnstore table.

Get columnstore indexes in a DB

We can get general information about columnstore indexes in a DB with following script


SELECT schema_name(o.schema_id)+'.'+o.name AS TableName, i.name AS IndexName, i.type_desc AS IndexType
FROM sys.indexes i
INNER JOIN sys.objects o ON i.object_id = o.object_id
WHERE i.type > = 4
ORDER BY TableName

Columnstore indexes on partitioned tables

Columnstore indexes can be implemented with partitioned tables by keeping these partition aligned.

  • Aug 17 / 2017
  • 0
SQL Azure, SQL Server tools, SQL Server Utilities, SSMS tips and tricks

Authentication methods in SSMS 17.2

In previous post about 17.2 we briefly mentioned about its download flavors. There is much more to explore in it. Login box of new SSMS shows additional options along with conventional Windows Authentication and SQL Server Authentication.

Authentication types in SSMS 17.2

These additional options are

  • Active Directory – Universal with MFA support (Introduced in SSMS 17.2)
  • Active Directory – Password (Introduced in SSMS 17.0)
  • Active Directory – Integrated (Introduced in SSMS 17.0)

All three options have one thing common that they relate to Azure active directory. Azure SQL Database and Azure SQL Data Warehouse support connections from SSMS using Active Directory Universal Authentication.

Active Directory – Universal with MFA support is multi factor authentication and involves two step verification by adding call, SMS, bio metric device, smart cards with pin, or mobile app notification and can result in pop up box for validation.

Remaining two options Password and integrated are not interactive and do not generate any dialogue box. Can be used in different applications like ADO.NET, JDBC, ODBC etc.

Pages:1234567...86
Consult us to explore the Databases. Contact us