Please Note: This article is written for users of the following Microsoft Excel versions: 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021. If you are using an earlier version (Excel 2003 or earlier), this tip may not work for you. For a version of this tip written specifically for earlier versions of Excel, click here: Removing Duplicates Based on a Partial Match.
Written by Allen Wyatt (last updated July 8, 2023)
This tip applies to Excel 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021
Farris has a worksheet that contains addresses. Some addresses are very close to the same, such that the street address is the same and only the suite number portion of the address differs. For instance, one row may have an address of "85 Seymour Street, Suite 101" and another row may have an address of "85 Seymour Street, Suite 412." Farris is wondering how to remove the duplicates in the list of addresses based on a partial match—based only on the street address and ignoring the suite number.
The simplest solution is to further split the addresses into separate columns, such that the suite number is in its own column. You can do that by following these steps:
Figure 1. The Convert Text to Columns wizard.
The street address should now reside in the original column and the previously blank column should now contain everything that was after the comma in the original addresses. In other words, the suite number is in its own column. With your data in this condition it is an easy step to use filtering to display or extract the unique street addresses.
If you don't want to permanently split up the addresses into two columns, you could use a formula to determine duplicates. Assuming that the address list is sorted, you could use a formula similar to the following:
=IF(OR(ISERROR(FIND(",",A3)),ISERROR(FIND(",",A2))), "",IF(LEFT(A3,FIND(",",A3))=LEFT(A2,FIND(",",A2)), "Duplicate",""))
This formula assumes that the addresses to be checked are in column A and that this formula is placed somewhere in row 3 of a different column. It first checks if there is a comma in either the address in the current row or the address in the row before. If there is no comma in either of the addresses, then it assumes there is no possible duplicate. It there is a comma in both of them, the formula checks the portion of the addresses before the comma. If they match, then the word "Duplicate" is returned; if they don't match, then nothing is returned.
The result of copying the formula down the column (so that one formula corresponds to each address) is that you will have the word "Duplicate" appear next to those addresses which match the first part of the previous address. You can then figure out what you want to do with those duplicates that are found.
Another option is to use a macro to determine your possible duplicates. There are any number of ways that a macro to determine duplicates could be devised; the one shown here simply checks the first X characters of a "key" value against a range and returns the address of the first matching cell.
Function NearMatch(vLookupValue, rng As Range, iNumChars) Dim x As Integer Dim sSub As String Set rng = rng.Columns(1) sSub = Left(vLookupValue, iNumChars) For x = 1 To rng.Cells.Count If Left(rng.Cells(x), iNumChars) = sSub Then NearMatch = rng.Cells(x).Address Exit Function End If Next NearMatch = CVErr(xlErrNA) End Function
For instance, let's assume that your addresses are in the range A2:A100. In column B you can use this NearMatch function to return addresses of possible duplicates. In cell B2 enter the following formula:
=NearMatch(A2,A3:A$100,12)
The first parameter for the function (A2) is the cell you want to use as your "key." The first 12 characters of this cell are compared against the first 12 characters of each cell in the range A3:A$100. If a cell is found in that range in which the first 12 characters match, then the address of that cell is returned by the function. If no match is located, then the #N/A error is returned. If you copy the formula in B2 down, to cells B3:B100, each corresponding address in column A is compared to all the addresses below it. You end up with a list of possible duplicates in the original list.
Note:
ExcelTips is your source for cost-effective Microsoft Excel training. This tip (7886) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021. You can find a version of this tip for the older menu interface of Excel here: Removing Duplicates Based on a Partial Match.
Solve Real Business Problems Master business modeling and analysis techniques with Excel and transform data into bottom-line results. This hands-on, scenario-focused guide shows you how to use the latest Excel tools to integrate data from multiple tables. Check out Microsoft Excel 2013 Data Analysis and Business Modeling today!
When working with a PivotTable, slicers and timelines can make short work of large data sets. This tip looks at all the ...
Discover MoreFiltering is a great tool when dealing with large data sets. Knowing how to apply a filter, though, can be a bit tricky ...
Discover MoreThe advanced filtering capabilities of Excel allow you to easily perform comparisons and calculations while doing the ...
Discover MoreFREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
2023-07-11 12:47:56
J. Woolley
Re. my previous comment, the Tip's application of NearMatch checks the first 12 characters and returns the address of the first matching cell or error #N/A if none:
=NearMatch(A2,A3:A$100,12)
If your Excel supports the FILTER function, the following formula returns the same result:
=INDEX(FILTER(ADDRESS(ROW(A3:A$100),1),
LEFT(A2,12)=LEFT(A3:A$100,12),NA()),1)
As before, this modification returns a link to the first matching cell:
=HYPERLINK(INDEX(FILTER("#"&ADDRESS(ROW(A3:A$100),1),
LEFT(A2,12)=LEFT(A3:A$100,12),NA()),1))
Notice the second argument of the ADDRESS function is 1 for column A. As prescribed by NearMatch, FILTER's third argument returns NA() if there is no matching cell.
If your Excel supports dynamic arrays, this formula returns a row vector with the contents of ALL cells that match the first 12 characters of the corresponding cell in column A:
=TRANSPOSE(FILTER(A$2:A$100,LEFT(A2,12)=LEFT(A$2:A$100,12)))
Here is a another version that ignores extraneous space characters:
=TRANSPOSE(FILTER(A$2:A$100,LEFT(TRIM(A2),12)=LEFT(TRIM(A$2:A$100),12)))
The result of each formula includes the contents of the corresponding cell in column A, but the following formula returns the contents of OTHER cells that match the first 12 characters of the corresponding cell in column A:
=TRANSPOSE(FILTER(A$2:A$100,(A2<>A$2:A$100)*(LEFT(A2,12)=LEFT(A$2:A$100,12))))
This version excludes any cell with content equal to the corresponding cell in column A and returns error #CALC! if no other cell matches its first 12 characters; #CALC! can be prevented by use of the FILTER function's optional third argument.
Finally, the following formula returns the address of EACH OTHER cell that matches the first 12 characters of the corresponding cell in column A:
=TRANSPOSE(FILTER(ADDRESS(ROW(A$2:A$100),1),
(A2<>A$2:A$100)*(LEFT(A2,12)=LEFT(A$2:A$100,12)),NA()))
In this case, NA() is returned if none.
2023-07-09 11:34:58
J. Woolley
The Tip's application of NearMatch checks the first 12 characters and returns the address of the first matching cell:
=NearMatch(A2,A3:A$100,12)
This modification returns a link to the first matching cell:
=HYPERLINK("#"&NearMatch(A2,A3:A$100,12))
The following function returns the number of other cells that match the first 12 characters:
=SUM((LEFT(A2,12)=LEFT(A$2:A$100,12))*1)-1
Here is a another version that ignores extraneous space characters:
=SUM((LEFT(TRIM(A2),12)=LEFT(TRIM(A$2:A$100),12))*1)-1
For a related subject, see https://excelribbon.tips.net/T013499_Generating_a_Keyword_Occurrence_List.html
Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.
FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
Copyright © 2024 Sharon Parq Associates, Inc.
Comments