Scaling the WP Importer

Small scale imports are handy, but when sites get large you can run into problems. Normally people fall back to using SQL dumps, but doing this sidesteps hooks, and filters that could be important. Here are some tips on how to make WordPress imports and exports using WXR files easier.

The First Major Problem

When loading a page, PHP has a limited amount of time to do its work. If it can’t finish within the time allotted, the PHP instance is killed, and the page load is incomplete. This is known as an execution timeout, and it’s intended to prevent badly written websites from using up all the resources on your server.

Once your WordPress site scales up to a certain point, the importer and exporter run into this execution limit. It’s inevitable, and making the importer or exporter faster would only raise the number of posts loaded before the limit is hit. You can set a new, higher limit in your php.ini, but that’s beyond most users skills, and can have unwanted consequences.

Execution timeouts sometimes show as half-loaded admin pages, white screens of death, etc. Look in your PHP error log and you’re likely to see this.

Fixing Execution Timeouts

The best, and most foolproof solution, is to not use the browser. Instead, run your imports and exports on the command line using WP-CLI:

wp export --dir=/tmp/

This also gives you many more options for filtering posts and post types, so that you only export what you want. E.g. exporting only a single users posts, all posts after 2014, all posts in a particular category, etc.

What if I don’t have SSH/Terminal access?

You can do it on your local machine.

Clone your site on to your computer by copying the files and database, along with a server setup such as MAMP/Vagrant. Now you can run the export locally, without need of an internet connection. This way you can run search and replace commands, and other operations.

The same is true of the import, run the import on your local machine, and upload the database and files to the remote server.

The Second Major Problem

Great, now the length of time needed to do the export/import is not a problem, but your import is still failing on large sites? You’ve now run into the second problem, memory. The importer and exporter need to load the content before it can separate it into posts and save to the database, but large sites can generate WXR files that can be hundreds of MB or even GB large.

Break the Export into Chunks

WP-CLI can export multiple WXR files, each with a maximum file size. At WordPress.com VIP we ask that WXR files are no larger than 5MB. This way, no more than 1 WXR’s worth of data is loaded at any one time. This means once the file is loaded, that memory is freed up ready for the next chunk. It also means if the import fails halfway through, we’re never more than 5MB of data behind.

Sometimes, smaller chunks might be necessary depending on the codebase used.

Disable Image Resizing

When you upload an image, WordPress resizes it multiple times for each image size registered. During an import, you want to disable this, then do the resizing at the end in a batch operation.

Don’t do Unnecessary Work

While importing, your plugins and theme are loaded, and poorly built code can do work even when it’s not needed. Don’t make remote requests when saving posts, and check if an import is happening or not before doing things.

Even after doing all these things, very large imports can take even days to finish, but now you know they will finish.

1 thought on “Scaling the WP Importer

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.