SYBASE REPLICATION SERVER PERFORMANCE AND TUNINGUnderstanding and Achieving Optimal Performance with Sybase Replication Server

Table of ContentsTable of Contents. 2Author’s Note. 4Introduction . 5Overview and Review. 6Replication System Components . 6Replication Server Internal Processing. 7Analyzing Replication System Performance . 10Primary DataServer/Database. 12DataServer Configuration Parameters . 12Primary Database Transaction Log. 13Application/Database Design . 13Replication Agent Processing . 20Secondary Truncation Point Management. 20Rep Agent LTL Generation . 21Replication Agent Communications. 25Replication Agent Troubleshooting. 25Replication Server General Tuning . 27Replication Server/RSSD Hosting. 27RS Generic Tuning . 29RSSD Generic Tuning . 33RSM/SMS Monitoring . 33Inbound Processing . 36SQM Processing . 36SQT Processing . 40Distributor Processing . 47Minimal Column Replication . 53Outbound Queue Processing . 58DSI SQT Processing. 58DSI Transaction Grouping. 60DSI Function String Generation . 64DSI Command Batching. 67RSI Queue Processing . 68Replicate DataServer/Database . 70Performance Monitoring. 70Replicate DataServer Resources. 72Trigger/Procedure Execution Time . 72Concurrency Issues. 77Parallel DSI Performance . 79Need for Parallel DSI . 79Parallel DSI Internals. 80Serialization Methods . 85Large Transaction Processing. 90Maximizing Performance with Parallel DSI’s. 96Text/Image Replication . 100Text/Image Datatype Support. 100RS Implementation & Internals . 102Performance Implications. 1082

Asynchronous Request Functions . 110Purpose . 110Implementation & Internals. 112Performance Implications. 115Multiple DSI’s . 116Concepts & Terminology . 116Performance Benefits . 116Implementation. 117Business Cases. 1323

Author’s NoteThinking is hard work – “Silver Bullets” are much easier.Several years ago, when Replication Server 11.0 was fairly new, Replication Server Engineering (RSE) collaborated ona paper that was a help to us all. Since that time, Replication Server has gone through several releases and ReplicationServer Engineering has been too busy keeping up with the advances in Adaptive Server Enterprise and the future ofReplication Server to maintain the document. However, the requests for a paper such as this have been a frequentoccurrence, both internally as well as from customers. Hopefully, this paper will satisfy those requests. But as theabove comment suggests, reading this paper will require extensive thinking (and considerable time). Anyone hopingfor a “silver bullet” does not belong in the IT industry.This paper was written for and addresses the functionality in Replication Server 12.0. As the ReplicationServer product continues to be developed and improved, it is likely that later improvements to the server, especiallyanticipated improvements scheduled for the 12.1 release, may supercede the recommendations contained in this paper.It is assumed that the reader is familiar with Replication Server terminology, internal processing and in general thecontents of the Replication Server System Administration Guide. In addition, basic Adaptive Server Enterpriseperformance and tuning knowledge is considered critical to the success of any Replication System’s performanceanalysis.4

Introduction“Just How Fast Is It?”This question gets asked constantly. Unfortunately, there are no standard benchmarks such as TPC-C for replicationtechnologies and RSE does not have the bandwidth nor resources to do benchmarking. Consequently, the stock RSEreply used to be 5MB/min (or 300MB/hr) based on their limited testing on development machines (small ones at that).However, Replication Server has been clocked at 2.4GB/hr sustained in a 1.2TB database and more than 40GB hasbeen replicated in a single day into the same 1.2TB database (RS 12.0 and ASE 11.9.3 on Compaq Alpha GS140’s forthe curious). Additionally, some customers have claimed that by using multiple DSI’s, they have achieved 10,000,000transactions an hour!! Although this sounds unrealistic, a monitored benchmark in 1995 using Replication Server 10.5achieved 4,000,000 transactions (each with 10 write operations) a day from the source replicating to three destinations(each with only 5 DSI’s) for a total delivery of 12,000,000 transactions per day (containing 120,000,000 writeoperations). As usual, your results may vary. Significantly. It all depends. And every other trite caveat muttered by atuning guru/educator/consultant). The goal of this paper is to educate so that the reader understands why they may beseeing the performance they are and suggest possible avenues to explore with the goal of improved performancewithout resorting to the old tried-and-true trial-and-error stumble-fumble.Before we begin, however, it is best to lay some ground rules about what to expect or not to expect from this paper.Focusing on the latter: This paper will not discuss database server performance and tuning (although it frequently is the cause ofpoor replication performance) except as required for replication processing. This paper will not discuss non-ASE RepAgent performance (perhaps it will in a future version) exceptwhere such statements can be made generically about RepAgents. This paper will not discuss Replication Server Manager. This paper will not discuss how to “benchmark” a replicated system. This paper will not discuss Replication Server system administration.Now that we know what we are going to skip, what we will cover: This paper will discuss all of the components in a replication system and how each impacts performance. This paper will discuss the internal processing of the Replication Server, ASE’s Replication Agent and thecorresponding tuning parameters that are specific for performance.It is expected that the reader is already familiar with Replication Server internal processing and basic replicationterminology as described in the product manuals. This paper focuses heavily on Replication Server in an AdaptiveServer Enterprise environment.In the future, it is expected that this paper will be expanded to cover several topics only lightly addressed in this versionor not addressed at all. These topics include: Heterogeneous replication performance Replication Routes (Direct and Indirect) Replication Server 12.1 Monitors and Counters5

Overview and ReviewWhere Do We Start?Unfortunately, this is the same question that is asked by someone faced with the task of finding and resolvingthroughput performance problems in a distributed system. The last words of that sentence hold the key it’s adistributed system. That means that there are lots of pieces and parts that contribute to Replication Server performance– most of which are outside of the Replication Server. After the system has been in operation, there are several RScommands that will help isolate where to begin. However, if just designing the system and you wish to takeperformance in to consideration during the design phase (always a must for scalable systems), then the easiest place tobegin is the beginning. Accordingly, this paper will attempt to trace a data bit being replicated through the system.Along the way, the various threads, processes, etc. will be described to help the reader understand what is happening(or should happen?) at each stage of data movement. After getting the data to the replicate site, a number of topics willbe discussed in greater detail. These topics include text/image replication, parallel DSI’s, etc. A quick review of thecomponents in a replication system and the internal processing within Replication Server are illustrated in the nextsectionsReplication System ComponentsThe components in a basic replication system are illustrated below. For clarity, the same abbreviations used in productmanuals as well as educational materials are used. The only addition to this over pictures in the product manuals is theinclusion of SMS – in particular, Replication Server Manager (RSM) and the inclusion of the host for the RS/RSSD.HostLOGRSSDRSMRSSD DSPDB LOGRAPDSRSRDS LOG RDBFigure 1 – Components of a Simple Replication SystemOf course, the above is extremely simple – the basic single direction primary to replicate distributed system, oneexample of which is the typical Warm-Standby configuration.Whether for performance reasons or due to architectural requirements, often the system design involves more than oneRS. A quick illustration is included below:6

LOGPRSRSSDLOGRRSRSSDRSMRRSRSSD DSPRSRSSD DSPDBLOGRAPDSRRSPRSRDSLOGRDBIRSIRSRSSD DSLOGIRSRSSDFigure 2 – Components of a Replication System Involving More Than One RSThe above is still fairly basic. Today, some customers have progressed to multi-level tree-like structures or virtualnetworks exploiting high-speed bandwidth backbones to form information buses.Replication Server Internal ProcessingWhen hearing the terms “internal processing”, most Replication Server administrators immediately picture the internalthreads. While understanding the internal threads is an important fundamental concept, it is strictly the starting point tobeginning to understand how Sybase Replication Server processes transactions. Unfortunately, many ReplicationServer administrators stop there, and as a result never really understand how Replication Server is processing theirworkload. Consequently, this leaves the administrator ill equipped to resolve issues and in particular to analyzeperformance bottlenecks within the distributed system. Details about what is happening within each thread as data isreplicated will be discussed in later chapters.Replication Server ThreadsThere are several different diagrams that depict the Replication Server internal processing threads. Most of these areextremely similar and only differ in the relationships between the SQM, SQT and dAIO threads. For the sake of thispaper, we will be using the following diagram, which is slightly more accurate than those documented in theReplication Server Administration Guide:7

cate DBDSISQT10Stable DevicePrimary DBSRETD7MD9SQMOutbound (0)Inbound (1)8Distributor6dAIOSQT51RepAgentRep AgentUser2Outbound (0)Inbound (1)4SQM3Figure 3 – Replication Server Internal Processing Flow DiagramReplicated transactions flow through the system as follows:1.Replication Agent forwards logged changes scanned from the transaction log to the ReplicationServer.2.Replication Agent User thread functions as a connection manager for the Replication Agent andpasses the changes to the SQM.3.The Stable Queue Manager (SQM) writes the logged changes to disk via the Asynch I/O daemon.Once written to disk, the Replication Agent can safely move the secondary truncation point forward(based on scan batch size setting).4.Transactions from source systems are stored in the inbound queue until a copy has been distributedto all subscribers (outbound queue).5.The Stable Queue Transaction (SQT) thread requests the next disk block from the SQM and sortsthe transactions into commit order.6.Once the commit record for a transaction has been seen, the SQT alerts the Distributor thread that atransaction is available. The Distributor reads the transaction and determines who is subscribing toit, whether subscription migration is necessary, etc.7.Once all of the subscribers have been identified, the Distributor thread forwards the transaction tothe SQM for the outbound queue for the destination connection.8.The SQM writes the transaction out to the outbound queue via the Asynch I/O daemon.9.Transactions are stored in the outbound queue until delivered to the destination.10. The DSI Scheduler uses the SQT library functions to retrieve transactions from the outbound queueit transaction form and determines delivery strategy (batching, grouping, parallelism, etc.)11. Once the delivery strategy is determined, the DSI Scheduler then passes the transaction to a DSIExecutor.12. The DSI Executor translates the replicated transaction functions into the destination commandlanguage (i.e. Transact SQL) and applies the transaction to the replicated database.Again, the only difference here vs. those in the product manuals is the inclusion of the System Table Services (STS),Asynchronous I/O daemon (dAIO), SQT/SQM and queue data flow and the lack of a SQT thread reading from theoutbound queue (instead, the DSI-S is illustrated making SQT library calls). While the difference is slight, it isillustrated here for future discussion. Keeping these differences in mind, the reader is referred to the Replication ServerSystem Administration Guide for details of internal processing for replication systems involving routing or WarmStandby.8

Inter-Thread MessagingAdditionally, inter-thread communications is not done through a strict synchronous API call. Instead, each threadsimply writes a message into the target thread’s OpenServer message queue (standard OpenServer in memory messagestructures for communicating between OpenServer threads). Once the target thread has processed each message, it canuse standard callback routines or put a response message back into a message queue for the sending thread. Thisresembles the following:OpenClientCallbackRep AgentUserSQMOpenServerMessage QueuesFigure 4 – Replication Server Inter-Thread CommunicationsThose familiar with multi-threaded programming or OpenServer programming will recognize this as a commontechnique for communication between threads – especially when multiple threads are trying to communicate with thesame destination thread. Accordingly, callbacks are used primarily between threads in which one thread spawned theother and the child thread needs to communicate to the parent thread. An example of this in Replication Server is theDIST and SQT threads. The SQT thread for any primary database is started by the DIST thread. Consequently, inaddition to using message queues, the SQT and DIST threads can communicate using Callback routines.Note that the message queues are not really tied to a specific thread - but rather to a specific message. As a result, asingle thread may be putting/retrieving messages from multiple message queues. Consequently, it is possible to havemore message queues than threads, although the current design for Replication Server doesn’t require such. By now,those familiar with many of the Replication Server configuration parameters will have realized the relationship betweenseveral fairly crucial configuration parameters: num threads, num msgqueues and num msgs (especially why thisnumber could be a large multiple of num msgqueues). Since this section was strictly intended to give you abackground in Replication Server internals, the specifics of this relationship will be discussed later in the sectiondiscussion Replication Server tuning.OQID ProcessingOne of the more central concepts behind replication server recovery is the OQID – Origin Queue Identifier. The OQIDis used for duplicate and loss detection as well as determining where to restart applying transactions during recovery.The OQID composition depends largely on the source system. For Sybase ASE, the OQID is a 36 byte binary valuecomposed of the following elements:ByteContents1-2Database generation id (from dbcc gettrunc())3-8Log page timestamp9-14Log page rowid (rid)15-20Log page rid for the oldest transaction21-28Datetime for oldest transaction29-30Used by RepAgent to delete orphaned transactions31-32Unused33-34Appended by TD for uniqueness9

ByteContents35-36Appended by MD for uniquenessThrough the use of the database generation id, log page timestamp and log record row id (rid), ASE guarantees that theOQID is always increasing sequentially. As a result, any time the RS detects a OQID lower than the last one, it cansomewhat safely assume that it is a duplicate. Similarly at the replicate, when the DSI compares the ODID in thers lastcommit table with the one current in the active segment, it can detect if the transaction has already been applied.Why would there be duplicates? Simply because the Replication Server isn’t updating the RSSD or the rs lastcommittable with every replicated row. Instead, it is updating every so often after a batch of transactions has been applied.Should the system be halted mid-batch and then restarted, it is possible that the first several have already been applied.At the replicate, a similar situation occurs in that the Replication Server begins by looking at the oldest active segmentin the queue – which may contain transactions already applied.For heterogeneous systems, the database generation (bytes 1-2) and the RS managed bytes (33-36) are the same,however the other components depend on what may be available to the replication agent to construct the OQID. Thismay include system transaction id’s or other system generated information that uniquely identifies each transaction tothe Replication Agent.An important aspect of the OQID is the fact that each replicated row from a source system is associated with only oneOQID and vice versa. This is key to not only identifying duplicates for recovery after a failure (i.e. network outage),but also in replication routing. From this aspect, the OQID ensures that only a single copy of a message is delivered inthe event that the routing topology changes. Those familiar with creating intermediate replication routes and concept oflogical network topology provided by the intermediate routing capability will recognize the benefit of this behavior.The danger is that some people have attempted to use the OQID or origin commit time in the rs lastcommit table fortiming. This is extremely inaccurate. First, the origin commit time comes from the timestamp in the commit record (aspecific record in the transaction log) on the primary. This time is derived from the dataserver’s clock, which issynched with the system clock about once per minute. There can be drift obviously, but not more than a minute as it isre-synched each minute. The dest commit time in the rs lastcommit table, on the other hand, comes from the getdate()function call in rs update lastcommit. The getdate() function is a direct poll of the system clock on the replicate. Theresulting difference between the two could be quite large in one sense or even negative if the replicate’s clock wasslow.The best mechanism to determining latency is to simply run a batch of 1,000 atomic inserts into the primary andmonitor the end time at the primary and replicate. For large sets of transactions, obviously a stop watch is not evennecessary. If the Replication Server is keeping the system up to the point a stop watch would be necessary, then youdon’t have a latency problem. If, however, it finishes at the primary in 1 minute and at the replicate in 5 minutes – thenyou have a problem.Analyzing Replication System PerformanceHaving set the stage, the rest of this document will be divided into sections detailing how these components work inrelation to possible performance issues. The major sections will be: Primary DataServer/Database Replication Agent Processing Replication Server and RSSD General Tuning Inbound Processing Outbound Queue Processing Replicate DataServer/Database10

After these sections have been covered in some detail, this document will then cover several special topics related toDSI processing in more detail. This includes: Parallel DSI Performance Text/Image Replication Asynchronous Request Functions Multiple DSI’s11

Primary DataServer/DatabaseIt is Not Possible to Tune a Bad DesignThe above comment is the ninth principal of the “Principals of OLTP Processing” as stated by Nancy Mullen ofAndersen Consulting in her paper OLTP Program Design in OLTP Processing Handbook (McGraw-Hill). A truerstatement has never been written. Not only can you not fix it by replication, but in most cases, a bad design will alsocause replication performance to suffer. In many cases when replication performance is bad, we tend to focus quicklyat the replicate. While it is true that many replication performance problems can be resolved there, the primarydatabase often also plays a significant role. In this section, we will begin with basic configuration issues and then moveinto some of the more problematic design issues that affect replication performance.DataServer Configuration ParametersWhile Sybase has striven (with some success) to make replication transparent to the application, it is not transparent tothe database server. In addition to the Replication Agent Thread (even though significantly better than the older LTM’sas far as impact on the dataserver), replication can impact system administration in many ways. One of those ways isproper tuning of the database engine’s system configuration settings. Several settings that would not normally beassociated with replication, nonetheless, have a direct impact on the performance of the Replication Agent or inprocessing transactions within the Replication Server.Procedure Cache SizingA common misconception is that procedure cache is strictly used for caching procedure query plans. However, inrecent years, this has changed. The reason is than in most large production systems, the procedure cache was grosslyoversized, consequently under utilized and contributed to the lack of resources for data cache. For example, in asystem with 2GB of memory dedicated to the database engine, the default of 20% often meant that 400MB of memorywas being reserved for procedure cache. Often, real procedure cache used by stored procedure plans is less than 10MB.ASE engineers began tapping in to this resource by caching subquery results, etc. in procedure cache. When theReplication Agent thread was internalized within the ASE engine (ASE 11.5), it was no different. It also usedprocedure cache.The Replication Agent uses procedure cache for several critical functions: Schema Cache - Caching for database object structures, such as table, column names, text/image replicationstates, used in the construction of LTL. Transaction Cache - Caching LTL statements pending transfer to the Replication ServerAs a result, system administrators who have tuned the procedure cache to the minimal levels prior to implementingreplication may need to increase it to accommodate Replication Agent usage. You can see how much memory aReplication Agent is using via the 9204 trace flag (additional information on enabling/disabling Replication Agenttrace flags is located in the Replication Agent section).sp config rep agent db name , “trace log file”, “ filepathname ”sp config rep agent db name , “traceon”, “9204”-- monitor for a few minutessp config rep agent db name , “traceoff”, “9204”Generally speaking, the Replication Agent’s memory requirements will be less than normal server’s metadata cacherequirements for system objects (sysobjects, syscolumns, etc.). A rule of thumb if sizing a new system for replicationmight be to use the metadata cache requirements as a starting point.Metadata CacheThe metadata cache itself is important to replication performance. As will be discussed later, as the Replication Agentreads a row from the transaction log, it needs access to the object’s metadata structures. If forced to read this fromdisk, the Replication Agent processing will be slowed while waiting for the disk I/O to complete. Careful monitoringof the metadata cache via sp sysmon during periods of peak performance will allow system administrators to size themetadata cache configurations appropriately.12

User Log Cache (ULC)User (or Private) Log Cache was implemented in Sybase SQL Server 11.0 as a means of reducing transaction logsemaphore contention and the number of times that the same log page was written to disk. In theory, a properly sizedULC would mean that only when a transaction was committed, would the records be written to the physical transactionlog. One aspect of this that could have had a large impact on the performance of replication server was that this wouldmean that a single transaction’s log records would be contiguous on disk vs. interspersed with other user’s transactions.T

When hearing the terms "internal processing", most Replication Server administrators immediately picture the internal threads. While understanding the internal threads is an important fundamental concept, it is strictly the starting point to beginning to understand how Sybase Replication Server processes transactions.