During phase of training material preparation, a brief overview of replication architecture is with me. In coming lines i am going to put it in this post as a good starting point for replication learners.
Replication is a way of distributing data automatically from a source database to one or more recipient databases. It is built to sync the selected objects of a database not whole DB. It encompasses set of objects in a database whose data is required to keep sync at subscriber databases. Replication architecture is not part of the SQL Server database engine, but a separate feature or application. It works through agents, triggers and scheduled jobs specific for this purpose. Replication can have uses in high availability systems or in a distributed system and can have different toplogies.
There are 3 main types of replication
- Transactional Replication (one or two way)
- Merge Replication
- Snapshot Replication
This type of SQL Server replication is used for near real time or short delayed synchronization from publisher and subscriber. Initially a snapshot process synchronizes the subscribers with the publisher. All committed transactions on the publisher are then propagated to the subscribers in sequence. It may be one way transactional replication or two way.
In one way replication, modifications from publisher are synced on subscriber but no modification from subscriber is propagated back to publisher. While in case of peer to peer or two way replication, changes are propagated both ways and data remains sync on both sides for selected articles.
There are two options while propagating changes back from subscriber to publisher.
- Immediate updating
- Queued updating
Both methods can be used at the same time to maximize the availability and efficiency.
In case of immediate updating changes made at the subscriber are synced to the publisher using a two-phase commit process managed by Microsoft Distributed Transaction Coordinator (MS DTC). Changes made at the subscriber must be made to both systems or are rolled back. This method is suitable for reliable, high-speed connection between the Publisher and Subscriber.
This method is suitable for less reliable connections. Writes updates at the subscriber to a queue. Publisher reads the queue and applies changes. This flow might result in conflicts due to latency.
Peer to Peer replication
Peer to peer replication was introduced in SQL Server 2005 and only supported in Enterprise edition of SQL Server version. It resembles in some features features with two way transactional replication. Main difference is that peer to peer replication involves more than just single publisher and associated subscriber nodes. It can span on multiple nodes and each will be kept in Sync. Also peer to peer replication is better while propagating the DDL of articles and locking mechanism.
Merge replication is relatively complex type of replication. It does not use transactions but a set of defined rules used to resolve conflicts when data updated at multiple sites is synchronized. It tracks changes through triggers and metadata tables provides the means to implement part of a high-availability system. This also serves well the original purpose of serving mobile and disconnected users. It is designed for cases where the publishers are not in constant communication with the subscribers. After the initial snapshot synchronization, subsequent changes are tracked locally and the databases are merged when in contact using series of rules to resolve conflicts.
Data changes are not tracked for snapshot replication. Each time a snapshot is applied, it completely overwrites the existing data. Snapshot replication process provides the initial synchronization for transactional and merge publications. In several cases, this initial synchronization is all that is necessary. Data is synced at defined intervals as opposed to constant transactional sync of other types.
Distribution is important setup in replication architecture. It has following components and terms.
Distribution server is a role where distribution setup (agent, jobs, DB) resides. Distribution server may be same as publisher or a dedicated server. For big and busy replication setups separate distributor server is recommended. We can mark a server as distributor before going to setup replication publisher and subscriber.
Distribution is system database for replication setup and is created automatically on server with distributor role. We can customize the DB name, files name, location and file size however. For transactional replication, the distribution database tracks changes made to publications. For merge replication it stores only synchronization history.
Distribution agent is setup as a separate module and works in form of scheduled job. Distribution agent is a process that moves transactions and snapshots held in the distribution database to subscribers. Works in form of scheduled job under replication jobs category.
There are some other terms and components that are always involved in replication setup.
Publisher is a server that makes data available for other servers, i.e. it is the server that makes data available for replication.
Publication can be a selection of tables, indexed view and stored procedures. Each publication can be replicated as a snapshot publication, a transactional publication or a merge publication. It is important to choose the publication articles on a valid base like size, relation of tables. There may be multiple publications in a single publisher.
An object included in a publication is called an article. Articles are combined on some basis to form a publication. Article can be an entire table, select rows from a table, speciﬁed columns from a table indexed view or a stored procedure. An article is a unit in publication.
Subscriber is the server that hosts the DB where data from publisher is being synced. There may be multiple subscribers against a single publisher. Subscriber may be on same server as that of publisher. Also there may be multiple subscriber databases on a single subscriber server.
Subscription relates to publication and goes to subscriber database. There may be multiple subscriptions against a single publisher and each will go to separate subscriber database. There are two types of subscriptions.
With pull subscription the subscriber asks the publisher for periodic updates. With pull subscription each subscriber can specify the best time for the update. Pull subscription is generally considered the better form of replication when a large number of subscribers are involved, or where the load on the LAN needs to be distributed.
With push subscription the publisher pushes any changes made to the data out to the subscribers. Typically push subscription is used where changes need to be replicated to the subscribers shortly after the change occurs, i.e. where the replication of changes is required in near real time.
As we have discussed several components of replication in above sections some agents were discussed as well. It looks suitable to list the agents so that they may give a clear picture of setup
- Snapshot agent
- Log Reader Agent
- Distribution Agent
- Merge Agent
- Que Reader Agent
Replication effect on log file
It is important to know the relation and effects of replication on database log file. If replication falls in latency then log begins to accumulate in log file unless it is send to distributor. This may ultimately fill the log file and slow down the system in some cases. Situation is same when database is with any recovery model FULL or even SIMPLE.
Topology is how the replication setup is designed and components are arranged. Following are different disgrams to give view of different toplogies.