«
»

A Word Macro for Processing Plain Text

If you’re like me, sometimes you like to write in a plain text editor or on an iPad app, but the end result needs to be styled appropriately in Microsoft Word.[1] Here are some things I do when doing that:

  • Search & replace all double spaces with single spaces.
  • Replace all double-dashes (–) with em-dashes (&mdash). In the case of Onyx Path stuff, with spaces around em-dashes.
  • Replace all single quote marks with quote marks. If you’re not familiar with the process, that sounds weird, but what this does is cause Word to trigger smart quotes on all such characters.
  • Same as single quote marks, but for double quotes.
  • Replace three dots (…) with an ellipsis character (…).
  • Replace all of the multiple hard returns I’ll do with a single one. (I’ll often do two hard returns in a text editor so I can see them easily as separate paragraphs.)
  • Remove all of the hanging and trailing spaces on each paragraph.

How to do all that manually

The first few are simple, as it’s just global search & replace. With the dashes and the ellipsis marks, I never remember the hot keys, so I copy one instance of the mark I want in the replace field, and I paste it into the field — pretty simple.

It’s the wildcards that can get tricky. Here’s what it should look like:

Word Multiple Paragraph

The funky bit in “Find what” is Word code for “find any times when there are two or more carriage returns in a row.” That’s how Word does “regular expressions” — and if that all looks fucking weird and arcane to you, don’t worry! It’s programmer-speak, and if you’re inclined, click on that link and learn more.

What’s in “Replace with” is the code for “paragraph break.” Why am I’m using “^13” in Find and “^p” in Replace? That’s a thorny questions, one whose answer is either “trust me” or “read up about crazy find/replace mojo.”

The “Use wildcards” box needs to be checked, or none of this works.

Here’s what I do for the above searches:

  • First, I look for all spaces trailing at the end of a paragraph. (Yes, you won’t normally see them, but they do fuck with other searches and are sloppy in general.) I search for ” {1,}^13″ (don’t include the quotes, and it does start with a space). I replace with “^p” (again, without the quotes).
  • Then I look for all the spaces hanging at the beginning of a paragraph, which is both sloppy and visible. I search for “^13 {1,}” (don’t include the quotes, and there’s a space between the ‘3’ and the ‘{‘). I replace with “^p” (again, without the quotes).
  • Finally, I kill all the multiple paragraph marks. I search for “^13{2,}” (no quotes, and this time there are no spaces), and again replace with “^p” (again, no quotes).

Really important: I have my cursor at the top of the document. It doesn’t quite work if your cursor happens to be in the middle of one of these searches.

Oh, and make sure that when you don’t need to use wildcards anymore, that that box is unchecked.

When to do this

This is all stuff I do the moment I bring text into Word, before I process it further. I’m assuming when I do this that I’m going to do much more text processing, which means I’ll end up finding any weirdness that happens from global search/replaces.

How to not do that manually

That’s a lot of stuff to do manually every time I need to, and typing all that in every time is asking for errors. So why not make a macro! (If you’re lost here, check out the first post where I put up a macro.)

Sub SimpleSearchReplaceAll(sFind As String, sReplace As String)
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = sFind
        .Replacement.Text = sReplace
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Sub SimpleSearchReplaceAllWild(sFind As String, sReplace As String)
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = sFind
        .Replacement.Text = sReplace
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = True
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Sub ProcessPlaintextMacro()
    'move to start of document
    Selection.HomeKey Unit:=wdStory
    'double spaces
    SimpleSearchReplaceAll "  ", " "
    'smart double quotes
    SimpleSearchReplaceAll """", """"
    'smart single quotes
    SimpleSearchReplaceAll "'", "'"
    'em-dash
    SimpleSearchReplaceAll "--", "—"
    'ellipsis
    SimpleSearchReplaceAll "...", "…"
    'hanging space before paragraph
    SimpleSearchReplaceAllWild "^13 {1,}", "^p"
    'trailing space at end of paragraph paragraph
    SimpleSearchReplaceAllWild " {1,}^13", "^p"
    'double paragraph; do this after fixing hanging/trailing spaces
    SimpleSearchReplaceAllWild "^13{2,}", "^p"
End Sub

Once that’s installed, there’s a new macro to choose from:

ProcessPlaintextMacro

Select that and click on Run, and it’ll process the entire document. (Oh, and if you’re not programmer-savvy, the green lines are comments in the code that aren’t executed.)

Two notes: first, if you know how to overload a sub in VBA so that I don’t need two different versions just to turn on wildcard, I’m all ears. (That ideally works as late as 2008.)

Second, this is the building block for other macros that convert text to different house styles — all this stuff is basic, and my macros for, say, making my life easier on documents for various publishers would start here.

– Ryan

[1] Or you have freelancers who miss something when submitting turnover to you. Same methods apply.

Share
«
»

4 Responses to A Word Macro for Processing Plain Text

  1. Eden says:

    Have you considered passing in the wildcard setting to the sub?

    For example, your new definition would be:

    Sub SimpleSearchReplaceAll(sFind As String, sReplace As String, bWildcard As Boolean)
    # all other code for the Sub remains the same except for the following line
    .MatchWildcards = bWildcard

    And two calls to illustrate its use:
    SimpleSearchReplaceAll “…”, “…”, False
    SimpleSearchReplaceAllWild “^13 {1,}”, “^p”, True

    • Ryan Macklin says:

      Yeah, I could do that. I was getting hung up on trying to overload it, because I didn’t want to constantly have to use three parameters. But, yeah, that’s a totally legit approach, and if I can’t get that bug up my ass about the overloading the way I want, I will probably do that to make in a single function and make the macro smaller.

      Thanks!

      – Ryan

  2. Wayne Zombie says:

    I did a lot of work like this cleaning up database tables sucked out of an AS/400 before loading them in to SQL Server.

    I actually could use something that’s pretty much the opposite of your macro. I’ve been occasionally working on converting a solo adventure to Twee and found that their parser can’t handle unicode characters, so smart quotes etc. blow it right out of the water.

  3. Eden says:

    I haven’t tested this, so it may be slightly buggy, but this should let you have a single function without an extra parameter. I’d just drop it at the top and then set the .MatchWildcards statement equal to bWildcards in your setup below, but you might be able to do it all in one line.

    This assumes you’re always searching for ^13 when you want wildcards. If you want more than one wildcard string to turn on the wildcard setting, you may want to consider a Select statement or multiple If statements to sort them out.

    # This should set bWildcards variable to True if you’ve got ^13 in your search string.
    IIf(InStr(sFind,”^13″)>0,bWildcards=True,bWildcards=False)