LSN


SQL Server Replication Error - The specified LSN for repldone log scan occurs before the current start of replication in the log

MSSQLTips author Percy ReyesBy:    |   Read Comments   |   Related Tips: More > Replication
Problem

When working with SQL Server Replication there are times when you may get critical errors related to the Log Reader Agent. One of these errors is "The specified LSN [xxx] for repldone log scan occurs before the current start of replication in the log [xxx]". In this tip I will show how to deal with this type of error.

Solution

I'll give recommendations on how to fix the replication error and here is an actual error message as an example to help us analyze the issue:

The specified LSN {00000023:000000f8:0003} for repldone log scan occurs before the current 
start of replication in the log {00000023:000000f9:0005}. 

Below we can see what this error looks like using the Replication Monitor within SQL Server Management Studio.

Sessions in the SQL Server Replication Monitor

Note: these errors can be found in the Log Reader Agent history and also using SQL Server's Replication Monitor.

The Log Reader Agent is an executable that continuously reads the Transaction Log of the database configured for transactional replication and copies the transactions marked for replication from the Transaction Log into the msrepl_transactions table in the Distribution database.  It is important to know that when the Log Reader Agent starts it first verifies that the last transaction in the msrepl_transactions table in the Distribution database matches the Last Replicated Transaction in the Transaction Log for the published database (that is, the current start of replication). When this does not occur, an error will appear like shown above. Note that the msrepl_transactions table is a critical table and it must not be modified manually, because the Log Reader agent works based on the data in this table. If some entries are deleted or modified incorrectly then we will get many replication errors and one of them is what we are talking about in this tip.

Analyzing the SQL Server Replication Issue

As per our error message for this example, we can execute DBCC OPENTRAN on the published database to find more details about the opened and active replication transaction:

Replicated Transaction Information:
        Oldest distributed LSN     : (35:249:5)
        Oldest non-distributed LSN : (35:251:1)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Note that DBCC OPENTRAN gives us the LSN value in Decimal format and we need to convert the LSN from Decimal to Hexadecimal for further analysis.  Below I have converted the values from decimal to hexadecimal.

Oldest distributed LSN : (35:249:5) --> 00000023:000000f9:0005 (must match last row in msrepl_transactions table) 
Oldest non-distributed LSN : (35:251:1) --> 00000023:000000fb:0001 (oldest LSN in the transaction log)

If we analyze the output again,

The specified LSN {00000023:000000f8:0003} for repldone log scan occurs before the current 
start of replication in the log {00000023:000000f9:0005}. 

we can confirm that the LogReader agent is expecting to find LSN 000000f8 of VLF 00000023 which is currently the last entry in the msrepl_transactions table, but this is not the correct Last Replicated Transaction LSN. For SQL Server the correct Last Replicated Transaction LSN is 000000f9.

Note also that the oldest LSN in the Transaction Log is 000000fb of the same VLF 00000023. This means that the Log Reader is trying to start reading on LSN 000000f8 which is before the current start of replication in the log with LSN 000000f9.

You might be thinking about just moving the Log Reader to start at 000000f9 to fix this error?  The problem is that we do not know exactly how many transactions there are from 000000f8 to 000000f9, so if we move the Log Reader Agent pointer forward then we could miss some transactions and create data inconsistency issues.

Fixing the SQL Server Replication Issue

If you decide to move the Log Reader you can do it by using sp_repldone to mark 00000023:000000f9:0005 as the last replicated transaction, so you can skip and move forward. This can be done as shown below.  Here we are using the values from the error message: {00000023:000000f9:0005} or 00000023000000f90005.  We then flush and restart.

exec sp_repldone @xactid = x00000023000000f90005 , @xact_segno = x00000023000000f90005 
GO
-- releasing article cache...       
exec sp_replflush
GO
--Resetting LogReader to retrieve next transaction... 
--LogReader Agent will start up and begin replicating transactions....
exec sp_replrestart

With the above code, the error will disappear and the LogReader agent should continue running without error. If you want to know more about the LogReader Agent, you can review the links at the end of this tip.

Note: You should use sp_repldone only to troubleshoot specific internal replication errors and not use this as a regular way to fix issues.  In the case where you have more continuous errors like above it is better to recreate the replication publication again by dropping and then recreating.

Use sp_repldone with extreme caution because if you use the incorrect LSN you may skip valid transactions from the replication queue and you can create data inconsistency issues at the subscribers.

Fixing Error After a SQL Server Database Restore

Sometimes this type of error can appear after you restore a database that has publications. In this case an error similar to this will appear:

The specified LSN {00000000:00000000:0000} for repldone log scan occurs before the current 
start of replication in the log {00000023:000000f9:0005}.

You can see that Log Reader wants to start reading at LSN {00000000:00000000:0000} and it is because the msrepl_transactions table is empty. In this specific case you also can run sp_repldone to skip all false pending transactions in the msrepl_transactions table and move forward, so the LogReader ignores the incorrect starting point and can continue working.

exec sp_repldone @xactid=NULL, @xact_segno=NULL, @numtrans=0, @time=0, @reset=1

Note that we use @reset=1 to mark all replicated transactions in the Transaction Log as distributed.

Next Steps




Comments