Okay, here’s my stab at what I did and why. This is a conversion from a Movable Type 4.01 blog (set up with dynamic publishing for everything except the index.html) to WordPress 2.7.
Of course, you should check out the documentation on installation, starting here, most specifically the standard installation doc (and for the most part here I won’t repeat its info, just point out areas where I varied from it or where there was specific application of some points to my work). Print it out, scribble notes. etc. There’s also a specific and very useful WP codex document on importing from MT (there’s also an interesting, if bigger-scale project described here, and you can find a lot of info on the subject via Google, though of varying value, currency, or technical expertise).
I have multiple MT blogs. WordPress seems to offer three choices to handle that:
- Use a multi-blog setup like WordPressMU.
- Create multiple blogs within the same WP database in MySQL (with each blog’s tables having their own prefix).
- Create multiple WP databases in MySQL (with the tables inside all being standard).
For mostly aesthetic reasons, I decided on #2. This document just addresses the movement of my main blog from MT to WP.
As prep work, I saved copies (printouts and source files) of my front page, my CSS flie, etc., for future easy reference. I’m not actually deleting the blog out of MT (yet), but they’ll be convenient.
1. I did a test conversion on a little blog I had that made no difference. That informed some of the below choices, and gave me the confidence that I wasn’t going to royally mess things up.
2. I had over 13,500 entries in my MT blog, so I knew that the WP import routine (limited to 2Mb files) would choke. I created an MT index template that emulated the Export output, but let me choose a subset of the files at the time. You can find examples of this around on the Net. I essentially had an <MTEntries lastn=”500″ offset=”0″> container for the template, and had it associated with an export file export1.txt. I’d run it, and when it finished, I’d go in, increase the offset by 500 (offset = “500”, etc.) change the file to export2.txt, and repeated.
This took 28 iterations (i.e., up to export28.txt), and then I went through and validated the start/end dates of the posts in each one to make sure I hadn’t goofed up and missed anything. And, in fact, I had, so that double-check kept me from losing 500 entries.
An alternative to this would have been to tried to run the full export out of MT as normal, then manually break up the file through a text editor. There was an original reason I didn’t do this (I wanted to save the post ID in the export file), but it didn’t matter in the end, so that’s a fully workable alternative.
3. One of the biggest challenges I faced was in not wanting to lose my copious internal links within my blog, both trackbacks and simple links. The complications are that there were three different types of permalinks to the files. From 8/01 to 12/05, archives used the post number for the file name, e.g., /blog/mtarchive/0045678.html. After that, I started using a 15-byte dirified title, e.g., /blog/2007/04/18/happy_anniversa.html. In 9/06, I changed that to a dirified title of up to 25 bytes, e.g., /blog/2009/01/05/happy_birthday_to_me_tod.html
The codex document “Importing from Movable Type to WordPress,” though obsolete in some generalities, summarizes well how to do this conversion, and offers a variety of strategies. I early on decided I didn’t want to do a massive mod_rewrite section in my .htaccess file, nor did I want to generate 14K post files that did redirections. I decided (and determined through experimentation with that same test blog) that I could get by on the dirified titles, though there were a couple of complications:
- WP could create dirified titles for the posts, but at its own length and using hyphens, not underscores, for blanks. This could be mostly solved through manipulation of the filenames in the database (insert ominous chord here).
- I tend to reuse post titles frequently. That means I might have multiple dirified titles with a sequence appended by MT (get_lost1, get_lost2, get_lost3), which keeps them from being duplicated within the system. It wasn’t clear to me that WP would add sequences that way, but if it did, it would do in the sequence the files were loaded, which might not match.
- MT generates the basename of the file (when using directories) at the time the title is first saved. If I renamed a post, or spotted a typo and corrected it, the basename and the title would be out of sync. WP would generate the postname for the file based on the current title, not on what was in there as the basename.
I figured I could work around, deal with, or simply shrug off the items in those bullets. The numeric posts, though, were a major problem, since while MT still operated it would route any requests (through mtview.cgi) to the proper MT post, but once MT was turned off for that blog, WP would have no clue as to what post was being referenced.
So I spent the better part of a couple of days going through each of the export files, double-clicking on any numeric post titles I found in it. which pulled up the MT post. There I’d go to the permalink at the bottom of the post, which was the currently generated baseline dirified name. I’d copy that over into the export file to replace the numeric filename, then go on to the next.
Yes, that did take a lot of time. There were probably 400-500 of them I needed to do.
But when I was done, all of the internal links within the blog were with the dirified name styles that I could make use of in WordPress.
3.5 I also, while I was at it, converted each file to UTF-8 in my text editor.
4. I went through the next couple of standard steps in the install: upload the files set the wp-content folder to 777; updated the admin.php file to reflect the userid, password, and database table prefix; etc. I installed WP into a subdirectory under the blog, i.e., /blog/wp, and I had the blog running from within that WP directory (for the moment; that meant that both blogs would be operating at the same point).
5. I imported the 14 files up into WP, roughly oldest to newest (though, of course, the first posts within each file was the newest, so the sequence was all messed up anyway).
While this was going on, I did some messing around with the general settings in WP, turned on Akismet, played with the Theme, etc. I also added in the CSS bits from the old blog that I was going to need in the new one (including my drop shadow image codes).
6. I set the default filename format for files to /%year%/%monthnum%/%day%/%postname%.html; that created file formats that were very close to the legacy filenames from MT.
7. Now came the (ominous chord) manipulation of the database to hammer the WP-assigned filenames to match the old MT dirified format. I basically used the update queries from the MT conversion doc, in phpAdmin:
- UPDATE wp_posts SET post_name=SUBSTRING(post_name,1,15) WHERE post_date < ‘2006-09-08’: For posts older than 8 Sept 2006, truncate the filename down to 15 bytes. (Note that the total of the records affected will not equal the number of posts in that time frame, because some are already shorter than 15 bytes.)
- UPDATE wp_posts SET post_name=SUBSTRING(post_name,1,25) WHERE post_date >= ‘2006-09-08’: For posts newer than that, truncate the filename down to 25 bytes. (Ditto on query hits.)
- UPDATE wp_posts SET post_name=REPLACE(post_name, ‘-‘, ‘_’): Change the use of hyphens (WP style) to underscores.
- UPDATE wp_posts SET post_name = SUBSTRING(post_name,1,(length(post_name) -1)) WHERE RIGHT(post_name,1) = ‘_’: MT drops a trailing underscore (blank), so that happy_birthday_ is actually turned into happy_birthday. Do that same thing.
Note that these changes are such that they only affect those legacy posts; new posts going forward will still use the newer (hyphenated) WP format, and that’s okay.
Once this was done, I figured a good 90% of the internal links were correct. The exceptions would be ones with numeric suffixes and cases where titles were changed. That wasn’t as high as would be ideal, but it will hopefully do. Where gaps are uncovered, I can update the manually in the future (esp. since I’ll have the post title in the filename).
8. At this point, I was ready to make the WP version The Blog. I couldn’t run the old one at the same time, due to all of the dymamic bits and how the two were going to bump heads in .htaccess if they were both running in /blog/. So I …
- In the WP admin screen, I changed the blog publishing point to be /blog/, rather than /blog/wp.
- I copied the .htaccess suggestions into the .htaccess file in /blog/. I also copied the index.php from /blog/wp to /blog/ (these items per the admin screen). I took out the MT-generated mtview.php mod_rewrites (copying it into a text file for safekeeping) from .htaccess as well.
I changed the .htaccess file in the /blog/mtarchive directory to reroute requests, instead of to /blog/mtview.php to /blog/index.php, i.e., into WP. This came into play in the next step.
9. I’d found, in my poking around, a nifty plug-in called Redirection. This does a spiffy job of recording 404 (file not found) errors within WP, as well as letting you set up redirections, e.g., “if you get a request for X, redirect it to Y, and send back an error code 301 so search engines know about the change.” It will even notice when you change a title (thus changing the filename) and creates a redirection based on that.
So I installed that plug-in, then I went to Google Analytics to get a list of the top 100 URLs gone after on my blog. For the ones that were numerics (/blog/mtarchive/00123456.html), I clicked on them to get routed to the correct WP page (the mtview.php redirections in the mtarchive folder were routing through at this point to WP, which knew about them because I’d changed the filenames). Then I put the old mtarchive file name and the correct WP filename into a redirection record in Redirection. I did that with about 20 of the most popular old posts, figuring that would handle the majority of the traffic.
10. And that just left some housecleaning. My own checklist (some of which I’ve already done as of this writing) includes:
- Turn off the old MT cron jobs (for delayed posting — though it occurs to me I’ve used those for my remaining MT blogs, so they may get turned back on).
- Share something with a note up in Google Reader to let folks know that my feeds have likely changed (I sent out a “last call” post before the conversion). Remember that a lot of blog audience members read via RSS — and if your feed addies have changed, they may not recognize the change for days or weeks.
- Update Feedburner with the current feed.
- Review categories. Re-hierarchialize the cats and sub-cats.
- Add in Google Analytics info (use Analyticator?)
- Update my ecto/Linear configuration to point to the new WP blog. (Works like a champ.)
- Configure — both in Linear and in WordPress — what sites get pinged when I post something.
- Get posting-by-email working (it supposedly works out of the box with WP, with some configuration).
- Configure Flickr to be able to blog to the new site (as it was able to the old site). Again, this worked pretty easily.
- Update the About page, and create a Contact page.
- Figure out about using Twitter (blogging tweets, turn blog posts into tweets, etc.). Ponder about this — meantime, Twitter Tools looks good.
- Review the anti-spam (and CAPTCHA?) capabilities of WP and the blog as configured. WP comes out of the box with Akismet, which in a few days has performed pretty well. I’m going to hold off a bit with any other protections until it seems necessary.
- Review what I had on my old front page (from the printouts) and consider what (and how) to implement it.
- Plug-ins! Plug-ins! Plug-ins!
Eventually I’ll delete the blog from my MT database, and clean off or archive the old static files. And, of course, as I have multiple blogs, I’ll eventually be converting them over as later projects, after which I can, in theory, delete the MT installation altogether.
And that’s the story thus far. Any questions?
FYI the old link is still transmitting to the old version of the blog.
Index.html suddenly reappeared. Hmmm. I think that was a comment that got in yesterday forcing a rewrite. I’ve turned that off now.
The redirection was still kind of whacky, but I appear to have fixed that.
Great How-TO!
I realized after reading this I posted my plugin suggestions in the wrong place… D’Oh!
I can recommend some CAPTCHAs though cause I find I get a ton of bogus user accounts since there is really nothing stopping a bot from creating them. For this try “Register Plus”. I haven’t tinkered with it much but it appears to do the job.
Also I like the plugins “Bad Behavior” and “Cryptograph”, but they need a lot of tweaking and work. I haven’t tried them for about a year now so maybe they are better, but they just block too much from my experience. If I try them out again I will let you know my thoughts…
Also I’d like to hear your thoughts on posting via email. I was little scared to try it out as it’s basically a backdoor into posting on your website. The advice I read is to pick a really esoteric email address so that it minimizes the chance of bogus postings.
Thanks. I consider it half a thank you to the community for the info I was able to find myself, and half a chance to exercise my amateur tech writing skills (which is what got me into IT in the first place).
I’ve noticed some dubious registrations to date. I may well do something like “Register Plus.”
I have not gotten the post-by-email working reliably yet (I can’t get the cron job to see what it needs to see, and I haven’t fallen back to the “alternative” of putting a call to the mail program at the bottom of each page so it’s invoked by anyone who visits). But, yes, the advice is to come up with an unguessable email addy. And since it’s configurable at the drop of a hat, you can always turn it off if the Bad Hats figure it out.
I think, for CAPTCHA stuff, I want to use reCAPTCHA. I’ve installed it, but will need some feedback to confirm it’s working (since, for some odd reason, WP will not let me logout).
This is atest, with the reCAPTCHA widgety thing.
I use reCAPTCHA too. I don’t like it as much as other CAPTCHAs used, but other stuff is just too dang strong.
It’s not the best CAPTCHA out there, in terms of human-readability … but it’s conceptually so cool and useful, that overcomes a bit of inconvenience. (That sounds very Liberal of me.)
(Test comment. Nothing to see here. Move along.)