Splitting Sentences to Cells

by Allen Wyatt
(last updated March 31, 2018)

2

Pieter has a lot of cells in column A that contain text. Specifically, the cells contain several sentences of text each. He would like to split the sentences to individual cells. He knows he can use the Text to Columns tool, but that isn't entirely useful, as sentences can end with different punctuation and some punctuation can be used in the middle of a sentence. (Such as a period after a title like Mr. or Ms.) Pieter wonders if there is a better way to split the sentences to different cells.

To accomplish this task manually, there are a couple of ways you can go. First, you could use a helper column to work with your data. For instance, you might put this formula into cell B1:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"Mr.", "Mr#"), "Mrs.", "Mrs#"), "Ms.", "Ms#")

Then, copy the formula down to however many cells are necessary. What you end up with is your common titles (Mr., Mrs., and Ms.) being replaced with a unique sequence of characters (Mr#, Mrs#, and Ms#). Copy the results of column B back into column B as values (so the formula is removed), and then use Text to Columns on column B. Finally, use Find and Replace to change all instances of the # character to a period.

The drawback to this is that the formula only accounts for three common uses of the period, where you may need to actually handle quite a few more. For instance, your sentences might have titles such as Dr. or credential indicators such as Ph.D. or Esq. The list of such period-laden abbreviations could get quite long. In those instances, you could manually make the changes in this way:

  1. Use Find and Replace to look for all the periods in your text, replacing non-sentence-ending periods with the marker character (# in the above technique).
  2. Use the Text to Columns tool to split the sentences apart.
  3. Use Find and Replace to change all instances of the marker character back to a period.

Again, if you have a lot of non-sentence-ending periods, this process could take quite a while to go through.

Of course, these approaches deal with what is actually a complex topic. The real question is how does one define a sentence? In English, there are only three punctuation marks that terminate a sentence—a period, an exclamation mark, and a question mark. There are variations and exceptions to this, however. For instance, a sentence could end with a quote mark, but that quote mark will always have one of the three terminating punctuation marks in front of it. In addition, a period could be used to mark an abbreviation, as already noted.

If you start using Find and Replace to deal with all of these punctuation marks and exceptions, then you quickly can run into a convoluted series of steps. It is much better to try to do the splitting using a macro. Here's one that will handle most sentences and abbreviations properly:

Sub SplitSentences()
    Dim c As Range
    Dim sException(8) As String
    Dim sReplacement(8) As String
    Dim sTerm(6) As String
    Dim sTemp As String
    Dim J As Integer
    Dim sExp As Variant

    ' These are the valid ways for a sentence to end
    sTerm(1) = ". "
    sTerm(2) = "! "
    sTerm(3) = "? "
    sTerm(4) = "." & Chr(34)
    sTerm(5) = "!" & Chr(34)
    sTerm(6) = "?" & Chr(34)

    ' These are the exceptions to the rule
    ' of a period ending a sentence
    sException(1) = "Mr."
    sException(2) = "Mrs."
    sException(3) = "Ms."
    sException(4) = "Dr."
    sException(5) = "Esq."
    sException(6) = "Ph.D."
    sException(7) = "a.m."
    sException(8) = "p.m."

    ' Set up the replacements for the exceptions
    For J = 1 To 8
        sReplacement(J) = Replace(sException(J), ".", "[{}]")
    Next J

    For Each c In Selection
        sTemp = c.Value

        ' Convert all the exceptions
        For J = 1 To 8
            sTemp = Replace(sTemp, sException(J), sReplacement(J))
        Next J

        ' Demarcate sentences with a tab
        For J = 1 To 6
            sTemp = Replace(sTemp, sTerm(J), Trim(sTerm(J)) & Chr(9))
        Next J

        ' Split sentences into an array
        sExp = Split(sTemp, Chr(9))
        For J = 0 To UBound(sExp)
            ' Replace the code for valid periods
            sExp(J) = Replace(sExp(J), "[{}]", ".")
            ' Place sentences into adjacent cells on row
            c.Offset(0, J).Value = Trim(sExp(J))
        Next J
    Next c
End Sub

Note that the acceptable sentence terminations are noted in the sTerm array and the acceptable abbreviations are in the sException array. If your text might have other abbreviations, then you'll want to expand the sException array to include those.

The macro steps through whatever cells you have selected and replaces all the acceptable exceptions. It then replaces all the acceptable sentence terminations with that termination followed by a tab character. It then pulls apart the sentences based on the location of the tab character. Finally, it restores all the valid periods that were in the abbreviations and places the sentences on adjacent cells in the same row.

Note that the macro replaces whatever was in the selected cells and in however many cells are necessary to the right of the selection in order to store the sentences. Because of this, you might want to make sure you save your original worksheet before selecting a range of cells and running the macro.

Finally, you may want to note that the macro isn't perfect. It is perfectly acceptable from a grammarian's standpoint for an abbreviation to end a sentence. When this occurs, proper punctuation dictates that the final period in the abbreviation also serves as the terminating period for the sentence, as in these two short sentences:

Sheila earned her Ph.D. She was very happy.

Now, consider the following single sentence:

Sheila earned her Ph.D. from an Ivy League school.

When you compare the two examples (the two sentences vs. the single sentence), there is no way to discern, programmatically, between if Ph.D. ends a sentence or if it occurs in the middle of the sentence without checking to see if the following word begins with a capital letter or a quote mark followed by a capital letter. This can get quite complex very quickly. Plus, this applies to all abbreviations, not just Ph.D. Rather than try to anticipate and deal with all such occurrences, the macro noted above doesn't even try to discern whether an abbreviation ends a sentence or not—it simply treats all abbreviations as if they occur in the middle of a sentence.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12549) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, and Excel in Office 365.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Selecting an Entire Section

Documents can be subdivided into sections, with each of them formatted differently. If you want to select all the text in ...

Discover More

Calculating Form Fields

Form fields allow you to create (you guessed it) forms. Here is a description of how you can force Word to calculate the ...

Discover More

Enlarging the Formula Bar

The Formula bar is used to display the formula that appears in a cell. You may want to modify how the Formula bar is ...

Discover More

Program Successfully in Excel! John Walkenbach's name is synonymous with excellence in deciphering complex technical topics. With this comprehensive guide, "Mr. Spreadsheet" shows how to maximize your Excel experience using professional spreadsheet application development tips from his own personal bookshelf. Check out Excel 2013 Power Programming with VBA today!

More ExcelTips (ribbon)

Pasting Multiple Paragraphs Into a Single Cell

Copying information from one program (such as Word) to another (such as Excel) is a common occurrence. If you want to ...

Discover More

Clearing Everything Except Formulas

Need to get rid of everything in a worksheet except for your formulas? You can do it rather easily by applying the ideas ...

Discover More

Separating Text by Color

Excel allows you to use multiple colors to format the text in a cell. If you want to later separate that text to ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is nine more than 0?

2018-04-01 10:23:03

MIchael Armstrong

O, for the good old days when you properly ended a sentence with a period followed by 2 spaces.


2018-03-31 10:32:10

Brian L.

Elegant!


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.