Splitting Sentences to Cells

by Allen Wyatt
(last updated March 31, 2018)

2

Pieter has a lot of cells in column A that contain text. Specifically, the cells contain several sentences of text each. He would like to split the sentences to individual cells. He knows he can use the Text to Columns tool, but that isn't entirely useful, as sentences can end with different punctuation and some punctuation can be used in the middle of a sentence. (Such as a period after a title like Mr. or Ms.) Pieter wonders if there is a better way to split the sentences to different cells.

To accomplish this task manually, there are a couple of ways you can go. First, you could use a helper column to work with your data. For instance, you might put this formula into cell B1:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"Mr.", "Mr#"), "Mrs.", "Mrs#"), "Ms.", "Ms#")

Then, copy the formula down to however many cells are necessary. What you end up with is your common titles (Mr., Mrs., and Ms.) being replaced with a unique sequence of characters (Mr#, Mrs#, and Ms#). Copy the results of column B back into column B as values (so the formula is removed), and then use Text to Columns on column B. Finally, use Find and Replace to change all instances of the # character to a period.

The drawback to this is that the formula only accounts for three common uses of the period, where you may need to actually handle quite a few more. For instance, your sentences might have titles such as Dr. or credential indicators such as Ph.D. or Esq. The list of such period-laden abbreviations could get quite long. In those instances, you could manually make the changes in this way:

  1. Use Find and Replace to look for all the periods in your text, replacing non-sentence-ending periods with the marker character (# in the above technique).
  2. Use the Text to Columns tool to split the sentences apart.
  3. Use Find and Replace to change all instances of the marker character back to a period.

Again, if you have a lot of non-sentence-ending periods, this process could take quite a while to go through.

Of course, these approaches deal with what is actually a complex topic. The real question is how does one define a sentence? In English, there are only three punctuation marks that terminate a sentence—a period, an exclamation mark, and a question mark. There are variations and exceptions to this, however. For instance, a sentence could end with a quote mark, but that quote mark will always have one of the three terminating punctuation marks in front of it. In addition, a period could be used to mark an abbreviation, as already noted.

If you start using Find and Replace to deal with all of these punctuation marks and exceptions, then you quickly can run into a convoluted series of steps. It is much better to try to do the splitting using a macro. Here's one that will handle most sentences and abbreviations properly:

Sub SplitSentences()
    Dim c As Range
    Dim sException(8) As String
    Dim sReplacement(8) As String
    Dim sTerm(6) As String
    Dim sTemp As String
    Dim J As Integer
    Dim sExp As Variant

    ' These are the valid ways for a sentence to end
    sTerm(1) = ". "
    sTerm(2) = "! "
    sTerm(3) = "? "
    sTerm(4) = "." & Chr(34)
    sTerm(5) = "!" & Chr(34)
    sTerm(6) = "?" & Chr(34)

    ' These are the exceptions to the rule
    ' of a period ending a sentence
    sException(1) = "Mr."
    sException(2) = "Mrs."
    sException(3) = "Ms."
    sException(4) = "Dr."
    sException(5) = "Esq."
    sException(6) = "Ph.D."
    sException(7) = "a.m."
    sException(8) = "p.m."

    ' Set up the replacements for the exceptions
    For J = 1 To 8
        sReplacement(J) = Replace(sException(J), ".", "[{}]")
    Next J

    For Each c In Selection
        sTemp = c.Value

        ' Convert all the exceptions
        For J = 1 To 8
            sTemp = Replace(sTemp, sException(J), sReplacement(J))
        Next J

        ' Demarcate sentences with a tab
        For J = 1 To 6
            sTemp = Replace(sTemp, sTerm(J), Trim(sTerm(J)) & Chr(9))
        Next J

        ' Split sentences into an array
        sExp = Split(sTemp, Chr(9))
        For J = 0 To UBound(sExp)
            ' Replace the code for valid periods
            sExp(J) = Replace(sExp(J), "[{}]", ".")
            ' Place sentences into adjacent cells on row
            c.Offset(0, J).Value = Trim(sExp(J))
        Next J
    Next c
End Sub

Note that the acceptable sentence terminations are noted in the sTerm array and the acceptable abbreviations are in the sException array. If your text might have other abbreviations, then you'll want to expand the sException array to include those.

The macro steps through whatever cells you have selected and replaces all the acceptable exceptions. It then replaces all the acceptable sentence terminations with that termination followed by a tab character. It then pulls apart the sentences based on the location of the tab character. Finally, it restores all the valid periods that were in the abbreviations and places the sentences on adjacent cells in the same row.

Note that the macro replaces whatever was in the selected cells and in however many cells are necessary to the right of the selection in order to store the sentences. Because of this, you might want to make sure you save your original worksheet before selecting a range of cells and running the macro.

Finally, you may want to note that the macro isn't perfect. It is perfectly acceptable from a grammarian's standpoint for an abbreviation to end a sentence. When this occurs, proper punctuation dictates that the final period in the abbreviation also serves as the terminating period for the sentence, as in these two short sentences:

Sheila earned her Ph.D. She was very happy.

Now, consider the following single sentence:

Sheila earned her Ph.D. from an Ivy League school.

When you compare the two examples (the two sentences vs. the single sentence), there is no way to discern, programmatically, between if Ph.D. ends a sentence or if it occurs in the middle of the sentence without checking to see if the following word begins with a capital letter or a quote mark followed by a capital letter. This can get quite complex very quickly. Plus, this applies to all abbreviations, not just Ph.D. Rather than try to anticipate and deal with all such occurrences, the macro noted above doesn't even try to discern whether an abbreviation ends a sentence or not—it simply treats all abbreviations as if they occur in the middle of a sentence.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12549) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, and Excel in Office 365.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Counting Document Lines

Need to know how many lines are in your document? Word provides a quick and easy way you can determine the information.

Discover More

Printing Formatted Comments

When you print your worksheet, you may want Excel to include your comments as they appear on the screen. Here's how to ...

Discover More

Removing the Time Stamp from Tracked Changes

When you have Track Changes turned on, Word helpfully tracks, well, "changes." In doing so, it also notes who made ...

Discover More

Save Time and Supercharge Excel! Automate virtually any routine task and save yourself hours, days, maybe even weeks. Then, learn how to make Excel do things you thought were simply impossible! Mastering advanced Excel macros has never been easier. Check out Excel 2010 VBA and Macros today!

More ExcelTips (ribbon)

Requiring Input

If you distribute a workbook that is used by others for data entry, you may want a way to make sure they fill in certain ...

Discover More

Clearing and Deleting Cells

When you want to remove information from a worksheet, you can either clear cells or delete cells. This tip examines the ...

Discover More

Ensuring Standard Units During Data Entry

Need to make sure that information entered in a worksheet is always in a given unit of measurement? It's not as easy of a ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 9 - 8?

2018-04-01 10:23:03

MIchael Armstrong

O, for the good old days when you properly ended a sentence with a period followed by 2 spaces.


2018-03-31 10:32:10

Brian L.

Elegant!


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.