Convert WP to static HTML – part 2

This is a followup to this previous post.

So I’ve been converting some more blogs to static html files, and this time around things seemed to be so different, that I made up a new how to. Here are the steps that I’ve been using to convert blogs using the default Kubric theme.

  1. Update the permalink structure for the site so that it uses the year, month, day, postname structure.
    UPDATE `database`.`prefix_options` SET `option_value` = ‘/%year%/%monthnum%/%day%/%postname%/’ WHERE `prefix_options`.`option_name` = ‘permalink_structure’ LIMIT 1 ;
  2. Make sure the blog does not block search engines. If the blog is set to block them, wget can only download the index.html file. And this took me a while to figure out. So, for the sake of search engines, if wget only downloads the index.html file or wget recursive gets only index.html file, then remember to check your robots.txt or similar settings. Either edit in the admin section (under Settings->Privacy) or via SQL.
    UPDATE `database`.`prefix_options` SET `option_value` = '1' WHERE `prefix_options`.`option_name` = 'blog_public' LIMIT 1 ;
  3. Add the .htaccess file if not already there, where
    /path/to/wordpress/blog/

    starts at the URL root, not the absolute file path. So http://sitename.com/path/to/wordpress/blog/ would have the .htaccess file below in the ‘blog’ directory.

    RewriteEngine On
    RewriteBase /path/to/wordpress/blog/
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /path/to/wordpress/blog/index.php [L]
  4. Get rid of the meta links through the sidebar widget in the admin, or delete the appropriate lines from the theme files (for default Kubric theme edit comments.php, sidebar.php, single.php, footer.php), or see the last step. Delete the code that puts in the search, comments, trackback, rss, and anything in the footer you want out.
  5. When all is good, run wget to grab the files.
    wget --mirror -P blog-static -nH -np -p -k -E --cut-dirs=5 http://sitename.com/blog/
  6. Rename the blog directory. mv blog blog-old
  7. Rename the static directory to be live. mv blog-static blog
  8. Copy the images directory from the old theme to the appropriate static directory.
    cp -r blog-old/wordpress/wp-content/themes/default/images/ blog/wordpress/wp-content/themes/default/
  9. Alternative to get rid of unwanted links, etc. Use the find command to find all html files, then use perl to delete the lines. Don’t forget to escape forward slashes in the search field. Unfortunately, this method requires you to do it for every line of code you want to delete. It’s much better to delete the lines out of the theme files. The code below has an unnecessary space in the opening H3 tag so it will render properly.
    find . -name \*.html | xargs perl -ni -e 'print unless /< h3>Leave a Reply< \/h3>/'

    Also, if you want to just search and replace instead of remove, this handy find and perl one-liner will find and replace text in all html files.

    find . -name *.html | xargs perl -p -i'' -e "s/search text here/replace text there/"

    The above would search for all the “search text here” phrases in all html files, and replace it with “replace text here”. You can obviously substitute whatever you want in those to places. If you have a ‘/’ (forward slash) character, it will need to be escaped with a ‘\’ (back slash) character. Perl uses the regular regular expression syntax, so look that up if you need help formulating a search and replace structure.