InnoDB Import Speed

From Hashmysql
Jump to: navigation, search

Contents

Introduction

Importing large dump files for InnoDB can be quite time consuming, unless some precautions are taken prior to starting the process.

Faster dumping methods, such as innobackup/xtrabackup are faster if the entire server must be backed up.

When attempting to restore a single database, or moving to innodb_file_per_table, the approach described below is acceptable.

The following variables should be adjusted:

innodb_flush_log_at_trx_commit = 2

This will allow flushing to the log file less often, reducing the load on the I/O subsystem. See the official documentation.

If there are no production databases on this server right now, innodb_flush_log_at_trx_commit=0 is even faster than innodb_flush_log_at_trx_commit=2.

innodb_log_file_size = between 100M and 1G

Increasing the log file size will allow larger transactions to fit into one file, and reduce the number of checkpoints needed. See how to change the size safely.

set global sync_binlog = 0

This will not fsync binary log writes (see libeatmydata for the truly dedicated folks), so use with caution. See the official documentation.

innodb_buffer_pool_size = <value as large as possible>

You should use as much as 70 to 80 percent of your available memory during the import process. This variable can also be adjusted down after the import process, to allow the server to run other services. See the official documentation.

Ordering your data by the Primary Key, or the other indexes on your tables

Importing randomly ordered data can have disastrous consequences on the performance. Consider a scenario where your buffer pool is significantly smaller than your data. As a result, every record requires a disk seek, so an average seek time of 10ms can extend the import time to weeks.


When inserting in sorted order, once pages are filled they won't be revisited, so the working set is only the data that has been recently inserted. The older (not recently touched) pages will age out of the buffer pool (by LRU) allowing just the recent data to remain.


If your data fits in your buffer pool, you will still gain speed by ordering it, since poorly filled pages will result in more in more writes for the same amount of data.


Sample fill rates should be in the range of ~87.5% (~14% overhead) for sorted data, and ~50% (~100% overhead) for non-sorted data.


Ordering strategies should favour the primary key first, and then other indexes. When multiples indexes are present, the non-PK indexes should be rebuilt after the fact to maintain good performance.

Use a sane scheduler, preferably deadline

Choosing your scheduler wisely will go a long way to ensure efficient use of your hardware. Avoid the cfq scheduler.

echo deadline > /sys/block/xvda/queue/scheduler

(use your block device name, have this command execute on boot via rc.local or update grub or udev rules)

Adjusting the binlog options before invoking the mysql command

This sample series of commands will disable logging for your import session. The pv (pipe viewer) command will show the import progress.

For a non-compressed dump

Without pv

cat dump.sql | { echo "set sql_log_bin=0;"; cat; } | mysql

With pv

pv < dump.sql | { echo "set sql_log_bin=0;"; cat; } | mysql

For a compressed dump

Without pv

cat dump.sql.gz | { echo "set sql_log_bin=0;"; zcat; } | mysql

With pv

pv < dump.sql.gz | { echo "set sql_log_bin=0;"; zcat; } | mysql

Personal tools
Development Guides
Documentation
Installing MySQL
Other Information