Introduction

Importing large dump files for InnoDB can be quite time consuming, unless some precautions are taken prior to starting the process.

Faster dumping methods, such as innobackup/xtrabackup are faster if the entire server must be backed up.

When attempting to restore a single database, or moving to innodb_file_per_table, the approach described below is acceptable.

The following variables should be adjusted:

innodb_flush_log_at_trx_commit = 2

This will allow flushing to the log file less often, reducing the load on the I/O subsystem. See the official documentation.

If there are no production databases on this server right now, innodb_flush_log_at_trx_commit=0 is even faster than innodb_flush_log_at_trx_commit=2.

innodb_log_file_size = between 100M and 1G

Increasing the log file size will allow larger transactions to fit into one file, and reduce the number of checkpoints needed. See how to change the size safely.

set global sync_binlog = 0

This will not fsync binary log writes (see libeatmydata for the truly dedicated folks), so use with caution. See the official documentation.

innodb_buffer_pool_size = <value as large as possible>

You should use as much as 70 to 80 percent of your available memory during the import process. This variable can also be adjusted down after the import process, to allow the server to run other services. See the official documentation.

Ordering your data by the Primary Key, or the other indexes on your tables

Importing randomly ordered data can have disastrous consequences on the performance. Consider a scenario where your buffer pool is significantly smaller than your data. As a result, every record requires a disk seek, so an average seek time of 10ms can extend the import time to weeks.

When inserting in sorted order, once pages are filled they won't be revisited, so the working set is only the data that has been recently inserted. The older (not recently touched) pages will age out of the buffer pool (by LRU) allowing just the recent data to remain.

If your data fits in your buffer pool, you will still gain speed by ordering it, since poorly filled pages will result in more in more writes for the same amount of data.

Sample fill rates should be in the range of ~87.5% (~14% overhead) for sorted data, and ~50% (~100% overhead) for non-sorted data.

Ordering strategies should favour the primary key first, and then other indexes. When multiples indexes are present, the non-PK indexes should be rebuilt after the fact to maintain good performance.

Use a sane scheduler, preferably deadline

Choosing your scheduler wisely will go a long way to ensure efficient use of your hardware. Avoid the cfq scheduler.

echo deadline > /sys/block/xvda/queue/scheduler

(use your block device name, have this command execute on boot via rc.local or update grub or udev rules)

Adjusting the binlog options before invoking the mysql command

This sample series of commands will disable logging for your import session. The pv (pipe viewer) command will show the import progress.

For a non-compressed dump

Without pv

cat dump.sql | { echo "set sql_log_bin=0;"; cat; } | mysql

With pv

pv < dump.sql | { echo "set sql_log_bin=0;"; cat; } | mysql

For a compressed dump

Without pv

cat dump.sql.gz | { echo "set sql_log_bin=0;"; zcat; } | mysql

With pv

pv < dump.sql.gz | { echo "set sql_log_bin=0;"; zcat; } | mysql

InnoDB Import Speed

Contents

Introduction

innodb_flush_log_at_trx_commit = 2

innodb_log_file_size = between 100M and 1G

set global sync_binlog = 0

innodb_buffer_pool_size = <value as large as possible>

Ordering your data by the Primary Key, or the other indexes on your tables

Use a sane scheduler, preferably deadline

Adjusting the binlog options before invoking the mysql command

For a non-compressed dump

Without pv

With pv

For a compressed dump

Without pv

With pv

Views

Personal tools

Channel Links

Frequently Asked Questions

Tutorials

Development Guides

Documentation

Installing MySQL

Support

Other Information

Search

Toolbox