Splitting Sentences to Cells

by Allen Wyatt
(last updated March 31, 2018)

2

Pieter has a lot of cells in column A that contain text. Specifically, the cells contain several sentences of text each. He would like to split the sentences to individual cells. He knows he can use the Text to Columns tool, but that isn't entirely useful, as sentences can end with different punctuation and some punctuation can be used in the middle of a sentence. (Such as a period after a title like Mr. or Ms.) Pieter wonders if there is a better way to split the sentences to different cells.

To accomplish this task manually, there are a couple of ways you can go. First, you could use a helper column to work with your data. For instance, you might put this formula into cell B1:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"Mr.", "Mr#"), "Mrs.", "Mrs#"), "Ms.", "Ms#")

Then, copy the formula down to however many cells are necessary. What you end up with is your common titles (Mr., Mrs., and Ms.) being replaced with a unique sequence of characters (Mr#, Mrs#, and Ms#). Copy the results of column B back into column B as values (so the formula is removed), and then use Text to Columns on column B. Finally, use Find and Replace to change all instances of the # character to a period.

The drawback to this is that the formula only accounts for three common uses of the period, where you may need to actually handle quite a few more. For instance, your sentences might have titles such as Dr. or credential indicators such as Ph.D. or Esq. The list of such period-laden abbreviations could get quite long. In those instances, you could manually make the changes in this way:

  1. Use Find and Replace to look for all the periods in your text, replacing non-sentence-ending periods with the marker character (# in the above technique).
  2. Use the Text to Columns tool to split the sentences apart.
  3. Use Find and Replace to change all instances of the marker character back to a period.

Again, if you have a lot of non-sentence-ending periods, this process could take quite a while to go through.

Of course, these approaches deal with what is actually a complex topic. The real question is how does one define a sentence? In English, there are only three punctuation marks that terminate a sentence—a period, an exclamation mark, and a question mark. There are variations and exceptions to this, however. For instance, a sentence could end with a quote mark, but that quote mark will always have one of the three terminating punctuation marks in front of it. In addition, a period could be used to mark an abbreviation, as already noted.

If you start using Find and Replace to deal with all of these punctuation marks and exceptions, then you quickly can run into a convoluted series of steps. It is much better to try to do the splitting using a macro. Here's one that will handle most sentences and abbreviations properly:

Sub SplitSentences()
    Dim c As Range
    Dim sException(8) As String
    Dim sReplacement(8) As String
    Dim sTerm(6) As String
    Dim sTemp As String
    Dim J As Integer
    Dim sExp As Variant

    ' These are the valid ways for a sentence to end
    sTerm(1) = ". "
    sTerm(2) = "! "
    sTerm(3) = "? "
    sTerm(4) = "." & Chr(34)
    sTerm(5) = "!" & Chr(34)
    sTerm(6) = "?" & Chr(34)

    ' These are the exceptions to the rule
    ' of a period ending a sentence
    sException(1) = "Mr."
    sException(2) = "Mrs."
    sException(3) = "Ms."
    sException(4) = "Dr."
    sException(5) = "Esq."
    sException(6) = "Ph.D."
    sException(7) = "a.m."
    sException(8) = "p.m."

    ' Set up the replacements for the exceptions
    For J = 1 To 8
        sReplacement(J) = Replace(sException(J), ".", "[{}]")
    Next J

    For Each c In Selection
        sTemp = c.Value

        ' Convert all the exceptions
        For J = 1 To 8
            sTemp = Replace(sTemp, sException(J), sReplacement(J))
        Next J

        ' Demarcate sentences with a tab
        For J = 1 To 6
            sTemp = Replace(sTemp, sTerm(J), Trim(sTerm(J)) & Chr(9))
        Next J

        ' Split sentences into an array
        sExp = Split(sTemp, Chr(9))
        For J = 0 To UBound(sExp)
            ' Replace the code for valid periods
            sExp(J) = Replace(sExp(J), "[{}]", ".")
            ' Place sentences into adjacent cells on row
            c.Offset(0, J).Value = Trim(sExp(J))
        Next J
    Next c
End Sub

Note that the acceptable sentence terminations are noted in the sTerm array and the acceptable abbreviations are in the sException array. If your text might have other abbreviations, then you'll want to expand the sException array to include those.

The macro steps through whatever cells you have selected and replaces all the acceptable exceptions. It then replaces all the acceptable sentence terminations with that termination followed by a tab character. It then pulls apart the sentences based on the location of the tab character. Finally, it restores all the valid periods that were in the abbreviations and places the sentences on adjacent cells in the same row.

Note that the macro replaces whatever was in the selected cells and in however many cells are necessary to the right of the selection in order to store the sentences. Because of this, you might want to make sure you save your original worksheet before selecting a range of cells and running the macro.

Finally, you may want to note that the macro isn't perfect. It is perfectly acceptable from a grammarian's standpoint for an abbreviation to end a sentence. When this occurs, proper punctuation dictates that the final period in the abbreviation also serves as the terminating period for the sentence, as in these two short sentences:

Sheila earned her Ph.D. She was very happy.

Now, consider the following single sentence:

Sheila earned her Ph.D. from an Ivy League school.

When you compare the two examples (the two sentences vs. the single sentence), there is no way to discern, programmatically, between if Ph.D. ends a sentence or if it occurs in the middle of the sentence without checking to see if the following word begins with a capital letter or a quote mark followed by a capital letter. This can get quite complex very quickly. Plus, this applies to all abbreviations, not just Ph.D. Rather than try to anticipate and deal with all such occurrences, the macro noted above doesn't even try to discern whether an abbreviation ends a sentence or not—it simply treats all abbreviations as if they occur in the middle of a sentence.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12549) applies to Microsoft Excel 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Saving All Open Workbooks

Wouldn't it be nice to have a single command that would save each of you open workbooks, all at once? It's easy to do ...

Discover More

Allowing Only Comments in a Document

Develop a document that is to be reviewed by a group of people, and you may want to protect it in some way. One way you ...

Discover More

Font Changes when Pasting to Another Document

When you copy information from one document and paste it into another, you may not always get what you expect. If the ...

Discover More

Create Custom Apps with VBA! Discover how to extend the capabilities of Office 2013 (Word, Excel, PowerPoint, Outlook, and Access) with VBA programming, using it for writing macros, automating Office applications, and creating custom applications. Check out Mastering VBA for Office 2013 today!

More ExcelTips (ribbon)

Ensuring Rows and Columns are Empty

Before you go about deleting rows and columns helter-skelter, it is a good idea to determine if there is anything in the ...

Discover More

Relative Worksheet References when Copying

Copy a formula from one place to another and Excel helpfully adjusts the cell references within the formula. That is, it ...

Discover More

Setting a Length Limit on Cells

Limiting what can be entered in a cell can be an important part of developing a worksheet that other people use. Here are ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 3 + 4?

2018-04-01 10:23:03

MIchael Armstrong

O, for the good old days when you properly ended a sentence with a period followed by 2 spaces.


2018-03-31 10:32:10

Brian L.

Elegant!


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.