Yesterday, i was facing a replication issue in MySQL 8. The replication is suddenly stopped because of an error:
Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin-changelog.397390, end_log_pos 7392217. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
That error appeared when i run: SHOW REPLICA STATUS;
on MySQL replication server.
If we read the above error, it’s because workers cannot read source of binary log at mysql-bin-changelog.397390, at position 7392217. To see log error detail, open table performance_schema
at column replication_applier_status_by_worker
.
So i run this query to get error detail:
SELECT * FROM performance_schema.replication_applier_status_by_worker;
As a result, i can finally see what’s the exact cause. The cause is about duplicate primary key.
Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin-changelog.397390, end_log_pos 7392217; Error 'Duplicate entry '1587884218' for key 'track.PRIMARY'' on query. Default database: 'marketplace'. Query: 'INSERT INTO `track` (`track_process`, `order_no`, `create_date`) VALUES (7, 'ORDER-930012421', '2024-10-04 01:03:14')'
Cause
Why we get duplicate primary key? Based on my experience in maintaining mysql replication, there are some causes, are:
- Out Of Memory (OOM) -> This is causing MySQL replication server hang, and impacting both mysql thread, I/O thread and SQL thread.
- I/O thread = This thread performs pulling binary log from master and stored in replication server. If server get hang, I/O thread cannot pull binary log or causing the latest pulled binary log corrupt.
- SQL thread = This thread performs to execute statements (INSERT, UPDATE, DELETE, etc..) that exists in binary log. For you know, After I/O thread has successfully pulled binary log from master, SQL thread then read binary log (which is containing any MySQL statements).
IF OOM occured, SQL thread sometime runs any statements in binary log twice. It will cause duplicate primary key, For example SQL thread has inserted data with ID 1587884218, but because of OOM, SQL thread do inserting ID 1587884218 again.
- Out Of Storage, Usually this is also impacting binary log. The binary log will be get corrupted if out of storage occured.
Resolution
The simply solution is to delete some existed data that’s causing replication error. At early, we have known ID 1587884218 is causing replication error. So we just need to delete its ID to the last.
DELETE FROM tb_track WHERE id >= 1587884218;
Then, run statement stop and start replica:
STOP REPLICA;
START REPLICA;
Check replication again, is it working back or still error?
SHOW REPLICA STATUS;
If the output is nothing error Coordinator stopped..
, it means replication is working back. For example, see column Replica_SQL_Running_State (Waiting for dependent transaction to commit).
That above image means replication is working back. It says, waiting for new data and data will be inserted into database.
But, if there is same error as above (Coordinator stopped …), do checking table performance_schema.replication_applier_status_by_worker
again and repeat the above steps until the error solved.
Note: sometimes, i need some repeat to delete data in other tables (because duplicate primary key exists in some other tables, not only in table tb_track, but it can be in tb_others).
Summary
There are some causes why you get error – coordinator stopped and duplicate primary key on your MySQL replication server:
- Out of Memory
- Out of Storage
The solution is deleting existed data, stop, start replica, show replica. Do it some repeat until replication get working back.