DiffMate

Back to Blog

10 Essential Checklist Items Before Data Migration

March 10, 2025

Data migration is performed for various reasons including system replacement, cloud transition, and database upgrades. However, a single mistake can result in thousands of records being lost or corrupted, making thorough preparation essential.

The stakes are real. In 2012, Knight Capital Group lost $460 million in just 45 minutes due to a flawed software deployment during a migration process, ultimately leading to the company's acquisition. In 2017, British Airways suffered a massive IT systems failure stemming from a power supply mismanagement during a data center migration, affecting 75,000 passengers and costing approximately 100 million pounds. These incidents serve as stark reminders that data migration is not just a technical task — it can be an existential event for a business.

This article covers 10 must-check items before starting data migration. It's a practical guide you can apply directly in your work.

1. Verify Source Data Backup

The first step of migration is a complete backup of source data. Store backups in at least two locations and complete a restoration test. Thinking "we have backups, so it's fine" is dangerous. It's not uncommon for backup files to be found corrupted.

### The 3-2-1 Backup Rule

Follow the industry-proven 3-2-1 backup rule: maintain at least 3 copies of your data, store them on 2 different types of media, and keep 1 copy offsite. For example, you might keep one copy on local disk, another on a NAS device, and a third in cloud storage (AWS S3, Azure Blob Storage, or Google Cloud Storage).

### Full Backup vs Incremental Backup

Always perform a full backup immediately before migration. While incremental backups are sufficient for daily operations, a migration event requires a complete point-in-time snapshot. After creating the backup, compute checksums (MD5 or SHA-256) and record them so you can verify backup integrity later.

### Restoration Testing Is Non-Negotiable

Creating a backup is not enough. You must perform a restoration test in a separate environment. Measure the time it takes to restore — this measurement will be invaluable when planning your rollback strategy. Use the following SQL to verify basic database backup integrity:

```sql -- Record row counts before migration SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema = 'your_database';

-- Checksum key tables CHECKSUM TABLE orders, customers, products; ```

2. Create Data Mapping Documentation

Create a document that maps tables, field names, and data types between source and target systems on a 1:1 basis. Record all cases where field names differ or data types are converted. This document becomes the standard for post-migration verification.

### Commonly Overlooked Mapping Issues

There are many pitfalls in data type conversion. For example, converting MySQL's DATETIME to PostgreSQL's TIMESTAMP WITH TIME ZONE can silently drop timezone information. VARCHAR length differences between systems can truncate data. NULL handling, default values, AUTO_INCREMENT vs SERIAL behavior, and character encoding differences must all be documented.

### Leveraging ETL Tools

When manual mapping becomes complex, leverage ETL (Extract, Transform, Load) tools. Apache NiFi offers visual data flow design and is excellent for complex routing. Talend Open Studio excels at sophisticated transformation logic. AWS Glue provides serverless ETL in the cloud. For simpler mappings, a custom Python script using pandas and SQLAlchemy can be highly effective:

```python # Simple mapping script example import pandas as pd from sqlalchemy import create_engine

source_engine = create_engine('mysql://user:pass@source_host/db') target_engine = create_engine('postgresql://user:pass@target_host/db')

df = pd.read_sql('SELECT * FROM legacy_customers', source_engine)

# Field name mapping and transformation df = df.rename(columns={ 'cust_nm': 'customer_name', 'tel_no': 'phone_number', 'reg_dt': 'created_at' })

df.to_sql('customers', target_engine, if_exists='append', index=False) ```

3. Define Data Integrity Criteria

Clearly define what constitutes "success." Establish specific verification criteria in advance such as matching record counts, no missing required fields, and matching sum values. Moving forward without criteria with "it looks about right" will compound problems later.

### Data Integrity Verification Queries

Run systematic verification queries before and after migration:

```sql -- Compare record counts per table SELECT 'source' AS system, COUNT(*) AS cnt FROM source_db.orders UNION ALL SELECT 'target' AS system, COUNT(*) AS cnt FROM target_db.orders;

-- Compare numeric field aggregates SELECT 'source' AS system, SUM(amount) AS total, AVG(amount) AS avg_amount FROM source_db.orders UNION ALL SELECT 'target' AS system, SUM(amount) AS total, AVG(amount) AS avg_amount FROM target_db.orders;

-- Compare NULL value distribution SELECT 'source' AS system, SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_emails, SUM(CASE WHEN phone IS NULL THEN 1 ELSE 0 END) AS null_phones FROM source_db.customers UNION ALL SELECT 'target' AS system, SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_emails, SUM(CASE WHEN phone IS NULL THEN 1 ELSE 0 END) AS null_phones FROM target_db.customers;

-- Compare unique value distributions SELECT 'source' AS system, status, COUNT(*) AS cnt FROM source_db.orders GROUP BY status UNION ALL SELECT 'target' AS system, status, COUNT(*) AS cnt FROM target_db.orders GROUP BY status; ```

### Automated Validation Pipelines

Running queries manually one by one is inefficient and error-prone. Build an automated validation pipeline using Python or shell scripts that compares record counts, checksums, NULL distributions, and foreign key integrity across all tables. Data quality frameworks like Great Expectations allow you to define expectations as code and run them automatically against both source and target datasets.

4. Set Up Test Environment

Don't migrate production data directly. Always run in a test environment first, verify results, then apply to production. The test environment should be configured as identically to production as possible.

### Testing Methodology

Effective migration testing should be conducted in multiple phases. First, unit testing validates individual table migration logic. Second, integration testing confirms that inter-table relationships (foreign keys, referential integrity) are maintained. Third, load testing measures performance at actual data volumes. Finally, User Acceptance Testing (UAT) verifies that end users can perform normal operations with the migrated data.

### Performance Testing Considerations

Performance is a critical factor during migration. Measure the total time required for data transfer, network bandwidth consumption, and CPU/memory utilization on both source and target systems. For large-scale migrations, use tools like JMeter or Locust to assess the impact of migration activities on existing services. Identify potential bottlenecks before they cause problems in production — a migration that takes 4 hours in test but 20 hours in production due to higher data volume can blow past your maintenance window.

5. Check Encoding and Locale

Encoding issues frequently occur when multilingual data including Korean, Chinese, or Japanese is involved. While UTF-8 is the standard, legacy systems often use EUC-KR or Shift_JIS. Accurately identify the source encoding before migration.

Ignoring encoding problems leads to mojibake — garbled text that appears as "?????" or "ã??ã??" instead of the original characters. Once this corruption enters your target system, recovering the original data becomes extremely difficult. DiffMate can help you compare text files before and after migration to quickly detect character corruption introduced during encoding conversion.

Date and number formats also vary by locale. Korea uses "2025-03-10" as standard, but the US uses "03/10/2025" and parts of Europe use "10.03.2025". Decimal separators differ too — Korea and the US use a period (3.14), while Germany and France use a comma (3,14). Ignoring these differences can cause monetary values and dates to be misinterpreted with potentially severe business consequences.

6. Large-Scale Data Processing Strategy

Moving millions of records at once can cause system load and timeout issues. Develop a strategy to process in batches and log success/failure for each batch.

### Batch Processing Design

Process data in chunks of 10,000 to 50,000 records, respecting referential relationships between tables. For example, migrate the customers table before the orders table to avoid foreign key constraint violations. Set commit points for each batch so that failures only require re-running the failed batch, not the entire migration.

### Cloud-Specific Migration Challenges

Cloud migration introduces additional challenges. On AWS, the Database Migration Service (DMS) supports continuous replication through Change Data Capture (CDC). Azure offers the Azure Database Migration Service, and GCP provides its own Database Migration Service. For very large datasets, calculate your network transfer times carefully — transferring 1TB over a 100Mbps connection takes approximately 22 hours. Physical transfer services like AWS Snowball or Azure Data Box may be more practical for multi-terabyte migrations. Also consider data transfer costs: egress charges from cloud providers can be substantial and are easy to overlook in planning.

7. Establish Rollback Plan

Always prepare a plan to revert to the original state when migration fails or problems are discovered. Predetermine the time required for rollback, necessary resources, and responsible personnel.

### Parallel Run Strategy

One of the safest approaches is a parallel run: operating the old and new systems simultaneously for a defined period while comparing results. Once the new system's stability is confirmed, decommission the old system in stages. During the parallel run period, periodically export data from both systems to CSV/Excel and compare them using DiffMate to instantly spot discrepancies.

### Rollback Scenario Planning

Prepare response plans for multiple rollback scenarios. Scenario A: an error occurs during migration before service cutover — simply abort the migration and maintain the source system. Scenario B: data inconsistencies are discovered after migration completion — restore from backup and analyze the root cause of discrepancies. Scenario C: performance issues emerge after service cutover — reroute traffic back to the old system via DNS or load balancer changes. Document the estimated time and responsible person for each scenario.

8. Stakeholder Communication

Pre-announce migration schedule, expected downtime, and scope of impact to related teams. Unexpected service interruptions lead to customer complaints and business losses.

### Staged Communication Templates

Build a staged notification system for effective communication. Two weeks before migration, announce the overall schedule, scope of impact, and preparation requirements. One week before, provide the detailed timetable and emergency contact information. On the day of migration, provide real-time status updates at start, during progress, and upon completion. After completion, share a results summary and any remaining issues.

### Compliance Considerations

When migrating data that includes personally identifiable information, verify compliance with relevant regulations. Under GDPR (General Data Protection Regulation), transferring personal data requires appropriate safeguards, and additional legal basis is needed for transfers outside the EU. For HIPAA (Health Insurance Portability and Accountability Act), medical data migration must maintain encryption, access controls, and audit logs throughout the process. Many countries have their own data protection laws with specific requirements for cross-border data transfers. Consult your legal and compliance teams during the migration planning phase — discovering a regulatory issue mid-migration can force an emergency halt with far-reaching consequences.

9. Post-Migration Comparison Verification

After migration completion, you must compare and verify source and result data. Extracting to CSV or Excel and using comparison tools like DiffMate to check row-by-row and cell-by-cell differences helps quickly discover omissions or corruptions.

### Data Reconciliation Techniques

Post-migration data reconciliation operates at three levels. First, count reconciliation confirms that total record counts match per table. Second, value reconciliation compares aggregates for numeric fields — sums, averages, min/max values. Third, row-level reconciliation compares every row in critical tables on a 1:1 basis.

Row-level reconciliation is the most thorough but also the most time-consuming. This is where DiffMate shines. Export data from both source and target systems as CSV files, upload them to DiffMate, and visually inspect every difference. Changed cells are highlighted, and added or deleted rows are clearly distinguished. This visual approach catches subtle issues that aggregate-level checks miss — like a phone number with a transposed digit or a name with extra whitespace.

### Automated Validation Scripts

```sql -- Find records missing after migration SELECT s.id, s.customer_name FROM source_db.customers s LEFT JOIN target_db.customers t ON s.id = t.id WHERE t.id IS NULL;

-- Find records with changed values SELECT s.id, s.amount AS source_amount, t.amount AS target_amount FROM source_db.orders s JOIN target_db.orders t ON s.id = t.id WHERE s.amount <> t.amount; ```

10. Set Post-Migration Monitoring Period

Even when everything appears normal immediately after migration, problems may surface over time. Set an intensive monitoring period of at least 1-2 weeks and establish a system for immediate response when anomalies occur.

### Key Monitoring Metrics

During the post-migration monitoring period, watch these indicators closely. First, application error rates — if error rates climb compared to pre-migration baselines, there may be data structure or value issues. Second, query response times — missing indexes or stale statistics can degrade performance. Third, user reports — coordinate closely with your customer support team to catch any "something seems wrong with the data" reports immediately. Fourth, batch job success rates — if nightly batch jobs start failing after migration, there may be data format or access permission issues.

### Monitoring Automation

Pre-configure migration-specific dashboards in monitoring tools like Grafana, Datadog, or CloudWatch to quickly detect anomalies. Set threshold-based alerts so that spikes in error rates or response time degradation immediately notify the responsible team. Running a daily automated report that compares key metrics between source and target systems is also an excellent practice for catching slow-building issues before they become critical.

Conclusion

The success of data migration depends on preparation. Checking each item on this list minimizes unexpected problems. As the Knight Capital and British Airways cases demonstrate, the cost of migration mistakes can be far greater than anyone anticipates.

Post-migration data comparison verification is a critical step that must always be performed. Manually comparing every row of every table is realistically impossible, but tools like DiffMate let you visually and quickly identify differences in data exported as CSV or Excel. Systematic preparation, thorough verification, and the right tools are the three pillars of a successful migration.

Compare Data with DiffMate