Splitting Sentences to Cells

by Allen Wyatt
(last updated March 31, 2018)

2

Pieter has a lot of cells in column A that contain text. Specifically, the cells contain several sentences of text each. He would like to split the sentences to individual cells. He knows he can use the Text to Columns tool, but that isn't entirely useful, as sentences can end with different punctuation and some punctuation can be used in the middle of a sentence. (Such as a period after a title like Mr. or Ms.) Pieter wonders if there is a better way to split the sentences to different cells.

To accomplish this task manually, there are a couple of ways you can go. First, you could use a helper column to work with your data. For instance, you might put this formula into cell B1:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"Mr.", "Mr#"), "Mrs.", "Mrs#"), "Ms.", "Ms#")

Then, copy the formula down to however many cells are necessary. What you end up with is your common titles (Mr., Mrs., and Ms.) being replaced with a unique sequence of characters (Mr#, Mrs#, and Ms#). Copy the results of column B back into column B as values (so the formula is removed), and then use Text to Columns on column B. Finally, use Find and Replace to change all instances of the # character to a period.

The drawback to this is that the formula only accounts for three common uses of the period, where you may need to actually handle quite a few more. For instance, your sentences might have titles such as Dr. or credential indicators such as Ph.D. or Esq. The list of such period-laden abbreviations could get quite long. In those instances, you could manually make the changes in this way:

  1. Use Find and Replace to look for all the periods in your text, replacing non-sentence-ending periods with the marker character (# in the above technique).
  2. Use the Text to Columns tool to split the sentences apart.
  3. Use Find and Replace to change all instances of the marker character back to a period.

Again, if you have a lot of non-sentence-ending periods, this process could take quite a while to go through.

Of course, these approaches deal with what is actually a complex topic. The real question is how does one define a sentence? In English, there are only three punctuation marks that terminate a sentence—a period, an exclamation mark, and a question mark. There are variations and exceptions to this, however. For instance, a sentence could end with a quote mark, but that quote mark will always have one of the three terminating punctuation marks in front of it. In addition, a period could be used to mark an abbreviation, as already noted.

If you start using Find and Replace to deal with all of these punctuation marks and exceptions, then you quickly can run into a convoluted series of steps. It is much better to try to do the splitting using a macro. Here's one that will handle most sentences and abbreviations properly:

Sub SplitSentences()
    Dim c As Range
    Dim sException(8) As String
    Dim sReplacement(8) As String
    Dim sTerm(6) As String
    Dim sTemp As String
    Dim J As Integer
    Dim sExp As Variant

    ' These are the valid ways for a sentence to end
    sTerm(1) = ". "
    sTerm(2) = "! "
    sTerm(3) = "? "
    sTerm(4) = "." & Chr(34)
    sTerm(5) = "!" & Chr(34)
    sTerm(6) = "?" & Chr(34)

    ' These are the exceptions to the rule
    ' of a period ending a sentence
    sException(1) = "Mr."
    sException(2) = "Mrs."
    sException(3) = "Ms."
    sException(4) = "Dr."
    sException(5) = "Esq."
    sException(6) = "Ph.D."
    sException(7) = "a.m."
    sException(8) = "p.m."

    ' Set up the replacements for the exceptions
    For J = 1 To 8
        sReplacement(J) = Replace(sException(J), ".", "[{}]")
    Next J

    For Each c In Selection
        sTemp = c.Value

        ' Convert all the exceptions
        For J = 1 To 8
            sTemp = Replace(sTemp, sException(J), sReplacement(J))
        Next J

        ' Demarcate sentences with a tab
        For J = 1 To 6
            sTemp = Replace(sTemp, sTerm(J), Trim(sTerm(J)) & Chr(9))
        Next J

        ' Split sentences into an array
        sExp = Split(sTemp, Chr(9))
        For J = 0 To UBound(sExp)
            ' Replace the code for valid periods
            sExp(J) = Replace(sExp(J), "[{}]", ".")
            ' Place sentences into adjacent cells on row
            c.Offset(0, J).Value = Trim(sExp(J))
        Next J
    Next c
End Sub

Note that the acceptable sentence terminations are noted in the sTerm array and the acceptable abbreviations are in the sException array. If your text might have other abbreviations, then you'll want to expand the sException array to include those.

The macro steps through whatever cells you have selected and replaces all the acceptable exceptions. It then replaces all the acceptable sentence terminations with that termination followed by a tab character. It then pulls apart the sentences based on the location of the tab character. Finally, it restores all the valid periods that were in the abbreviations and places the sentences on adjacent cells in the same row.

Note that the macro replaces whatever was in the selected cells and in however many cells are necessary to the right of the selection in order to store the sentences. Because of this, you might want to make sure you save your original worksheet before selecting a range of cells and running the macro.

Finally, you may want to note that the macro isn't perfect. It is perfectly acceptable from a grammarian's standpoint for an abbreviation to end a sentence. When this occurs, proper punctuation dictates that the final period in the abbreviation also serves as the terminating period for the sentence, as in these two short sentences:

Sheila earned her Ph.D. She was very happy.

Now, consider the following single sentence:

Sheila earned her Ph.D. from an Ivy League school.

When you compare the two examples (the two sentences vs. the single sentence), there is no way to discern, programmatically, between if Ph.D. ends a sentence or if it occurs in the middle of the sentence without checking to see if the following word begins with a capital letter or a quote mark followed by a capital letter. This can get quite complex very quickly. Plus, this applies to all abbreviations, not just Ph.D. Rather than try to anticipate and deal with all such occurrences, the macro noted above doesn't even try to discern whether an abbreviation ends a sentence or not—it simply treats all abbreviations as if they occur in the middle of a sentence.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12549) applies to Microsoft Excel 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He  is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Converting Strings to Numbers

When creating macros, you often need to convert a text string that contains numbers into actual numeric values. You do ...

Discover More

Determining the Hour of the Day

Need to know the current hour of the day? You can derive the information in your macros by using the Hour function, as ...

Discover More

Converting from Relative to Absolute

Addresses used in a formula can be either relative or absolute. If you need to switch between the two types of ...

Discover More

Professional Development Guidance! Four world-class developers offer start-to-finish guidance for building powerful, robust, and secure applications with Excel. The authors show how to consistently make the right design decisions and make the most of Excel's powerful features. Check out Professional Excel Development today!

More ExcelTips (ribbon)

Getting Rid of Spaces in Cells

Importing data into Excel that was generated in other programs can have some interesting side effects. For instance, you ...

Discover More

Enhanced Filling

Using the AutoFill feature of Excel is very handy. If you want to expand the utility offered by the feature, all you need ...

Discover More

Getting Help when Entering Functions

Need a quick memory jog when entering a worksheet function? Here's a shortcut that will be invaluable.

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 9 - 4?

2018-04-01 10:23:03

MIchael Armstrong

O, for the good old days when you properly ended a sentence with a period followed by 2 spaces.


2018-03-31 10:32:10

Brian L.

Elegant!


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.