Splitting Sentences to Cells

by Allen Wyatt
(last updated March 31, 2018)

2

Pieter has a lot of cells in column A that contain text. Specifically, the cells contain several sentences of text each. He would like to split the sentences to individual cells. He knows he can use the Text to Columns tool, but that isn't entirely useful, as sentences can end with different punctuation and some punctuation can be used in the middle of a sentence. (Such as a period after a title like Mr. or Ms.) Pieter wonders if there is a better way to split the sentences to different cells.

To accomplish this task manually, there are a couple of ways you can go. First, you could use a helper column to work with your data. For instance, you might put this formula into cell B1:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"Mr.", "Mr#"), "Mrs.", "Mrs#"), "Ms.", "Ms#")

Then, copy the formula down to however many cells are necessary. What you end up with is your common titles (Mr., Mrs., and Ms.) being replaced with a unique sequence of characters (Mr#, Mrs#, and Ms#). Copy the results of column B back into column B as values (so the formula is removed), and then use Text to Columns on column B. Finally, use Find and Replace to change all instances of the # character to a period.

The drawback to this is that the formula only accounts for three common uses of the period, where you may need to actually handle quite a few more. For instance, your sentences might have titles such as Dr. or credential indicators such as Ph.D. or Esq. The list of such period-laden abbreviations could get quite long. In those instances, you could manually make the changes in this way:

  1. Use Find and Replace to look for all the periods in your text, replacing non-sentence-ending periods with the marker character (# in the above technique).
  2. Use the Text to Columns tool to split the sentences apart.
  3. Use Find and Replace to change all instances of the marker character back to a period.

Again, if you have a lot of non-sentence-ending periods, this process could take quite a while to go through.

Of course, these approaches deal with what is actually a complex topic. The real question is how does one define a sentence? In English, there are only three punctuation marks that terminate a sentence—a period, an exclamation mark, and a question mark. There are variations and exceptions to this, however. For instance, a sentence could end with a quote mark, but that quote mark will always have one of the three terminating punctuation marks in front of it. In addition, a period could be used to mark an abbreviation, as already noted.

If you start using Find and Replace to deal with all of these punctuation marks and exceptions, then you quickly can run into a convoluted series of steps. It is much better to try to do the splitting using a macro. Here's one that will handle most sentences and abbreviations properly:

Sub SplitSentences()
    Dim c As Range
    Dim sException(8) As String
    Dim sReplacement(8) As String
    Dim sTerm(6) As String
    Dim sTemp As String
    Dim J As Integer
    Dim sExp As Variant

    ' These are the valid ways for a sentence to end
    sTerm(1) = ". "
    sTerm(2) = "! "
    sTerm(3) = "? "
    sTerm(4) = "." & Chr(34)
    sTerm(5) = "!" & Chr(34)
    sTerm(6) = "?" & Chr(34)

    ' These are the exceptions to the rule
    ' of a period ending a sentence
    sException(1) = "Mr."
    sException(2) = "Mrs."
    sException(3) = "Ms."
    sException(4) = "Dr."
    sException(5) = "Esq."
    sException(6) = "Ph.D."
    sException(7) = "a.m."
    sException(8) = "p.m."

    ' Set up the replacements for the exceptions
    For J = 1 To 8
        sReplacement(J) = Replace(sException(J), ".", "[{}]")
    Next J

    For Each c In Selection
        sTemp = c.Value

        ' Convert all the exceptions
        For J = 1 To 8
            sTemp = Replace(sTemp, sException(J), sReplacement(J))
        Next J

        ' Demarcate sentences with a tab
        For J = 1 To 6
            sTemp = Replace(sTemp, sTerm(J), Trim(sTerm(J)) & Chr(9))
        Next J

        ' Split sentences into an array
        sExp = Split(sTemp, Chr(9))
        For J = 0 To UBound(sExp)
            ' Replace the code for valid periods
            sExp(J) = Replace(sExp(J), "[{}]", ".")
            ' Place sentences into adjacent cells on row
            c.Offset(0, J).Value = Trim(sExp(J))
        Next J
    Next c
End Sub

Note that the acceptable sentence terminations are noted in the sTerm array and the acceptable abbreviations are in the sException array. If your text might have other abbreviations, then you'll want to expand the sException array to include those.

The macro steps through whatever cells you have selected and replaces all the acceptable exceptions. It then replaces all the acceptable sentence terminations with that termination followed by a tab character. It then pulls apart the sentences based on the location of the tab character. Finally, it restores all the valid periods that were in the abbreviations and places the sentences on adjacent cells in the same row.

Note that the macro replaces whatever was in the selected cells and in however many cells are necessary to the right of the selection in order to store the sentences. Because of this, you might want to make sure you save your original worksheet before selecting a range of cells and running the macro.

Finally, you may want to note that the macro isn't perfect. It is perfectly acceptable from a grammarian's standpoint for an abbreviation to end a sentence. When this occurs, proper punctuation dictates that the final period in the abbreviation also serves as the terminating period for the sentence, as in these two short sentences:

Sheila earned her Ph.D. She was very happy.

Now, consider the following single sentence:

Sheila earned her Ph.D. from an Ivy League school.

When you compare the two examples (the two sentences vs. the single sentence), there is no way to discern, programmatically, between if Ph.D. ends a sentence or if it occurs in the middle of the sentence without checking to see if the following word begins with a capital letter or a quote mark followed by a capital letter. This can get quite complex very quickly. Plus, this applies to all abbreviations, not just Ph.D. Rather than try to anticipate and deal with all such occurrences, the macro noted above doesn't even try to discern whether an abbreviation ends a sentence or not—it simply treats all abbreviations as if they occur in the middle of a sentence.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12549) applies to Microsoft Excel 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Calculating the Interval between Occurrences

With a long list of items in a worksheet, you may want to determine the last time a particular item appeared in the list. ...

Discover More

Saving an Envelope for Future Use

It can take a while to get an envelope to appear just the way you need. Why throw your work away when you are done with ...

Discover More

Viewing Your Entire Document Width

The Zoom tool is very useful to help you see all of your document information. Here's how to make sure you can see all ...

Discover More

Solve Real Business Problems Master business modeling and analysis techniques with Excel and transform data into bottom-line results. This hands-on, scenario-focused guide shows you how to use the latest Excel tools to integrate data from multiple tables. Check out Microsoft Excel 2013 Data Analysis and Business Modeling today!

More ExcelTips (ribbon)

Merging Cells to a Single Sum

One way to make your worksheets less complex is to get rid of detail and keep only the summary of that detail. Here's how ...

Discover More

Getting Rid of Non-Printing Characters Intelligently

Is your worksheet, imported from an external source, plagued by non-printing characters that show up like small boxes ...

Discover More

Mouse Scroll Wheel Doesn't Work when Editing Formulas

Using your mouse to select cells for inclusion in a formula can be an exercise in futility on some systems. Here's why ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is three minus 3?

2018-04-01 10:23:03

MIchael Armstrong

O, for the good old days when you properly ended a sentence with a period followed by 2 spaces.


2018-03-31 10:32:10

Brian L.

Elegant!


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.