Please Note: This article is written for users of the following Microsoft Excel versions: 2007, 2010, and 2013. If you are using an earlier version (Excel 2003 or earlier), this tip may not work for you. For a version of this tip written specifically for earlier versions of Excel, click here: Removing Duplicates Based on a Partial Match.

Removing Duplicates Based on a Partial Match

by Allen Wyatt
(last updated October 29, 2015)

5

Farris has a worksheet that contains addresses. Some addresses are very close to the same, such that the street address is the same and only the suite number portion of the address differs. For instance, one row may have an address of "85 Seymour Street, Suite 101" and another row may have an address of "85 Seymour Street, Suite 412." Farris is wondering how to remove the duplicates in the list of addresses based on a partial match—based only on the street address and ignoring the suite number.

The simplest solution is to further split the addresses into separate columns, such that the suite number is in its own column. You can do that by following these steps:

  1. Make sure there is a blank column to the right of the address column.
  2. Select the cells that contain addresses.
  3. Display the Data tab of the ribbon.
  4. Click the Text to Columns tool in the Data Tools group. Excel starts the Convert Text to Columns wizard. (See Figure 1.)
  5. Figure 1. The Convert Text to Columns wizard.

  6. In the first step of the Wizard, make sure the Delimited option is selected, then click Next.
  7. In the second step of the Wizard, make sure the Comma check box is selected, then click Next.
  8. In the third step of the Wizard click Finish.

The street address should now reside in the original column and the previously blank column should now contain everything that was after the comma in the original addresses. In other words, the suite number is in its own column. With your data in this condition it is an easy step to use filtering to display or extract the unique street addresses.

If you don't want to permanently split up the addresses into two columns, you could use a formula to determine duplicates. Assuming that the address list is sorted, you could use a formula similar to the following:

=IF(OR(ISERROR(FIND(",",A3)),ISERROR(FIND(",",A2))),
"",IF(LEFT(A3,FIND(",",A3))=LEFT(A2,FIND(",",A2)),
"Duplicate",""))

This formula assumes that the addresses to be checked are in column A and that this formula is placed somewhere in row 3 of a different column. It first checks if there is a comma in either the address in the current row or the address in the row before. If there is no comma in either of the addresses, then it assumes there is no possible duplicate. It there is a comma in both of them, the formula checks the portion of the addresses before the comma. If they match, then the word "Duplicate" is returned; if they don't match, then nothing is returned.

The result of copying the formula down the column (so that one formula corresponds to each address) is that you will have the word "Duplicate" appear next to those addresses which match the first part of the previous address. You can then figure out what you want to do with those duplicates that are found.

Another option is to use a macro to determine your possible duplicates. There are any number of ways that a macro to determine duplicates could be devised; the one shown here simply checks the first X characters of a "key" value against a range and returns the address of the first matching cell.

Function NearMatch(vLookupValue, rng As Range, iNumChars)
    Dim x As Integer
    Dim sSub As String

    Set rng = rng.Columns(1)
    sSub = Left(vLookupValue, iNumChars)
    For x = 1 To rng.Cells.Count
        If Left(rng.Cells(x), iNumChars) = sSub Then
            NearMatch = rng.Cells(x).Address
            Exit Function
        End If
    Next
    NearMatch = CVErr(xlErrNA)
End Function

For instance, let's assume that your addresses are in the range A2:A100. In column B you can use this NearMatch function to return addresses of possible duplicates. In cell B2 enter the following formula:

=NearMatch(A2,A3:A$100,12)

The first parameter for the function (A2) is the cell you want to use as your "key." The first 12 characters of this cell are compared against the first 12 characters of each cell in the range A3:A$100. If a cell is found in that range in which the first 12 characters match, then the address of that cell is returned by the function. If no match is located, then the #N/A error is returned. If you copy the formula in B2 down, to cells B3:B100, each corresponding address in column A is compared to all the addresses below it. You end up with a list of possible duplicates in the original list.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (7886) applies to Microsoft Excel 2007, 2010, and 2013. You can find a version of this tip for the older menu interface of Excel here: Removing Duplicates Based on a Partial Match.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He  is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Removing Leading Spaces in a Table

If you work with data imported from the Web or with documents prepared by others, you may have tables that have leading ...

Discover More

Putting Your Index after Your Endnotes

Endnotes are supposed to be at the end of your document, right? Not necessarily. You may want something else at the end, such ...

Discover More

Monthly Close-Out Dates

If your company closes out its accounting months at the end of each calendar quarter, figuring out the proper closing dates ...

Discover More

Professional Development Guidance! Four world-class developers offer start-to-finish guidance for building powerful, robust, and secure applications with Excel. The authors show how to consistently make the right design decisions and make the most of Excel's powerful features. Check out Professional Excel Development today!

More ExcelTips (ribbon)

Counting Filtered Rows

The filtering capabilities of Excel are indispensable when working with large sets of data. When you create a filtered list, ...

Discover More

Skipping Rows when Filling

Using the fill handle is a great way to quickly fill a range of cells with values. Sometimes, however, the way to fill cells ...

Discover More

Showing Filter Criteria on a Printout

When you print out a filtered worksheet, you may want some sort of printed record as to what filtering was applied to the ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 8Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 6 - 3?

2015-10-29 10:29:40

Scott Renz

Did you notice that Sjaak wrote his remark over a year ago?


2015-10-29 10:16:31

Dave Bonin

Sjaak,

Building off of Mark's response, I believe Mr. Wyatt has written a book about Excel. In fact, he has written many books about Excel. A quick search of amazon.com shows 23 such books.


2015-10-29 07:09:45

Wilco

Hi Sjaak,

I can imagine that the explanation is too difficult for you to understand, but I think that your tone is very demanding. Please realise that Allen is providing a very nice service for free and the fact that it is not the right level for you, doesn't mean that it is not very valuable for many others. So I feel it is not up to you to demand from Allen "You've got to reconsider your target audience".

Best regards, Wilco


2015-10-29 07:08:09

Mark

Sjaak,

You have some valid points but the tips that Allen provides cover the whole gamut of user level.

I'd respectfully suggest that when you see a tip like this you do one of the following:
- ignore it,
- save it until your experience level increases (I have lots of these), or
- use it as a spur to increase your level of knowledge through the many web-sites that provide the background details you're looking for.

If Allen were to provide the level of detail you're looking for this tip would be a book.

And finally, the real reason this tip is challenging is that Excel isn't the best tool to use to address it. I suggest that using a regular expression (RegEx) and/or a database would be much better suited to the task.

"If the only tool you have is a hammer, it's amazing how every fastener looks like a nail."


2014-09-25 13:18:35

Sjaak

Sorry but this article is really tough to understand. I have no idea what to do, even though you explained it.

You provide formula's and macro's but don't tell us what to do with, where they need to be placed. Please make a step by step tutorial with more visuals.

You have to understand that most people are not advanced Excel users who have no clue what to do with the data you provided. You've got to reconsider your target audience.

I'm basically looking for a rule like duplicate values, only for partial text. Why isn't there any built in?


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.