Written by Allen Wyatt (last updated March 24, 2021)
This tip applies to Excel 2007, 2010, 2013, and 2016
Tom has a worksheet that contains about 20,000 cells full of textual data. From within these cells he needs to extract a specific pattern of text. The pattern is ##-##### where each # is a digit. This pattern does not appear at a set place in each cell. Tom wonders if there is a way to extract the desired information.
There are several ways that you can approach this problem, and the correct solution for your needs will depend on the characteristics of the data with which you are working. If you know that the only place in your data that you will have a dash is within your pattern, then you can key off of the presence of the dash by using a formula such as the following:
=MID(A1,FIND("-",A1)-2,8)
This finds the dash and then grabs the two characters to the left of the dash. This obviously will not work if there are dashes in other places in the text or if it is possible to have "patterns" that include non-digits (such as 12-34B32) and you want those excluded. In that case you'll need a much more complex formula:
=IF(ISERROR(INT(MID(A1, FIND("-", A1, 1)-2, 2)) & INT(MID( A1, FIND("-", A1, 1)+1, 5))), "", MID(A1, FIND("-", A1)-2, 8))
This includes an error-checking component that finds out if the characters just before the dash and just after the dash contain anything other than digits. If they do, then nothing is returned.
The one thing that these formulaic approaches don't do is to handle those situations where there may be more than one occurrence of the pattern within the same cell. In that case, a macro is the best approach. The following will extract the valid patterns and place them in a new worksheet called "Results."
Sub ExtractPattern() On Error Resume Next Set SourceSheet = ActiveSheet Set TargetSheet = ActiveWorkbook.Sheets("Results") If Err = 0 Then Worksheets("Results").Delete End If Worksheets.Add ActiveSheet.Name = "Results" Set TargetSheet = ActiveSheet Cells(1, 1).Value = "Found Codes" Cells(1, 1).Font.Bold = True iTargetRow = 2 SourceSheet.Select Selection.SpecialCells(xlCellTypeLastCell).Select Range(Selection, Cells(1)).Select For Each c In Selection.Cells If c.Value Like "*##-#####*" Then sRaw = c.Value iPos = InStr(sRaw, "-") Do While iPos > 0 If iPos < 3 Then sRaw = " " & sRaw iPos = iPos + 2 End If sTemp = Mid(sRaw, iPos - 2, 8) sRaw = Mid(sRaw, iPos + 6, Len(sRaw)) If sTemp Like "##-#####" Then TargetSheet.Cells(iTargetRow, 1) = sTemp iTargetRow = iTargetRow + 1 Else sRaw = Mid(sTemp, 4, 5) & sRaw End If iPos = InStr(sRaw, "-") Loop End If Next c End Sub
Note that the macro uses the Like function in two places. The first instance determines if the pattern occurs anywhere in the cell, and the second instance is used to determine if the extracted characters exactly match the desired pattern.
Note:
ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12332) applies to Microsoft Excel 2007, 2010, 2013, and 2016. You can find a version of this tip for the older menu interface of Excel here: Extracting a Pattern from within Text.
Excel Smarts for Beginners! Featuring the friendly and trusted For Dummies style, this popular guide shows beginners how to get up and running with Excel while also helping more experienced users get comfortable with the newest features. Check out Excel 2013 For Dummies today!
Do you need to generate strings of random characters? The ideas presented in this tip will help you do it in a hurry.
Discover MoreWant to create a sequential pattern using formulas? It's easy to do if you take a look at how your data repeats. This tip ...
Discover MoreIt's easy to use filtering to hide rows based on the value in a cell, but how do you hide rows based on the values in two ...
Discover MoreFREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
2021-03-28 07:26:35
Madhav
This will work for the source data you gave.
Relies on the "," and spaces be present.
======================================
Sub GetPattern()
Dim s
' patterns ##-##-# or ###-##-# or ####-##-or #####-##-# or ######-##-# or #######-##-#
s = "1-ab-4-abjljghkfdhjgf-2 nfkkgfg 2-nsnfkd 12345-67-8 mn,nfdgfd 2345-45-8, 2346799-91-0 nfkgjfdgfdj"
'=====================================================================
Dim iFirst
Dim s1
s1 = s
s1 = Trim(Left(s1, InStr(s, ",") - 1))
iFirst = Trim(Left(s1, InStrRev(s1, " ")))
iFirst = Right(iFirst, (Len(iFirst) - InStrRev(iFirst, " ")))
MsgBox iFirst
'====================================================================
Dim iFirstComma, iSecondComma
iFirstComma = InStr(s, ",")
iSecondComma = InStrRev(s, ",")
Dim s2
s2 = s
Dim iCenter
iCenter = Mid(s2, iFirstComma + 1, iSecondComma - iFirstComma - 1)
iCenter = Mid(iCenter, InStr(iCenter, " "))
MsgBox iCenter
'====================================================================
Dim iLast
Dim s3
s3 = s
iLast = Trim(Mid(s3, InStrRev(s3, ",") + 1))
iLast = Mid(iLast, 1, InStr(iLast, " "))
MsgBox iLast
End Sub
2017-02-20 08:33:30
Madhav
Thanks Peter for your interest and some tips.
My patterns are fixed
##-##-# or ###-##-# or ####-##-or #####-##-# or ######-##-# or #######-##-#
there will be space before them and then space or comma after them..
The source data could be "1-ab-4-abjljghkfdhjgf-2 nfkkgfg 2-nsnfkd 12345-67-8 mn,nfdgfd 2345-45-8, 2346799-91-0 nfkgjfdgfdj" I would like to extract only 12345-67-8 2345-45-8 2346799-91-0 so the patterns that I want to extract are #######-##-# ######-##-# #####-##-# ####-##-# ###-##-# ##-##-# wherein # is a digit... .. only variable length is of the starting digit string which could be 2 to 7 in length.. I just want ignore numbers and hyphens associated with text.. extract numbers with 2 to 7 digits-2 digits - 1 digit. Can you please help?
2017-02-19 10:37:07
Peter Atherton
Madhav
If you just need to extract all the numbers then the substitute function will do this.
Code Numbers Formulas
45-56-7 45567 =SUBSTITUTE(A2, "-", "")
123-112-7 1231127 =SUBSTITUTE(A3, "-", "")
56-*456-*147 56456147 =SUBSTITUTE(A4, "-*", "")
To extract part of the number, say 456 from the last code you need something else.
2017-02-18 04:57:59
Madhav
I would like to extract a pattern like ##-##-# or ###-##-# or ####-##-or #####-##-# or ######-##-# or #######-##-# where # is integer (0 to 9). This pattern may be present in text containing alphanumeric strings. There may be a space or , between the above patterns and the next string. Another strings present could be 1-*-5-*-any text. I would just like to ignore anything else but the digital pretends above. Can you please help?
Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.
FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
Copyright © 2023 Sharon Parq Associates, Inc.
Comments