Extracting Foreign-Language Characters

Written by Allen Wyatt (last updated June 1, 2024)
This tip applies to Excel 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021


3

Lucie has imported into a workbook data from her company's HR/payroll system that she needs to forward to a third-party vendor. Before she forwards it, however, she needs to compile a list of foreign-language characters contained in the employee names, such as umlauts and accents. She doesn't need to replace the characters; she just needs a list of them.

There are a couple of ways to do this. First, if you are using Excel 2021 or the version with Microsoft 365, you can put together a powerful formula that will pull the desired characters:

=LET(wl,A:A,cwl,TEXTJOIN(,TRUE,wl),cl,MID(cwl,SEQUENCE(LEN(cwl)),1),
SORT(UNIQUE(FILTER(cl,CODE(cl)>127))))

Even though the formula is shown here in two lines, it is a single formula. It assumes that all of the employee names are in column A. You can place this formula in a cell in a different column, and as long as there is nothing in the cells below the formula, you'll have a sorted list of characters returned.

The formula works by concatenating all the names in column A into a single string (assigned to the variable cwl) and then examining each character in the string. If the character has a code value of 128 or greater, then it is considered a foreign character. (Those with codes below 128 are assumed to be non-foreign characters.)

The formula returns only unique characters, and those are sorted. Because the formula concatenates all the names (using the TEXTJOIN function), there is a limit on what it can process. If the combined length of all the characters is 32,767 or more, then a #CALC error is returned.

If you are not using the latest versions of Excel or you might have more than 32,767 characters you are examining, then you should consider using a macro. Here's an example of one that will go through all of the cells in a range and return the foreign-language characters:

Function ForeignChars(ByVal MyRange As Range) As String
    'All characters from 0 to 127 are considered non-foreign
    'All characters above 127 are considered foreign

    Dim c As Range
    Dim sTemp As String
    Dim sChars As String
    Dim J As Integer
    Dim K As Integer
    Dim bFound As Boolean

    Application.Volatile
    sChars = ""
    For Each c In MyRange
        sTemp = c.Text
        For J = 1 To Len(sTemp)
            If AscW(Mid(sTemp, J, 1)) > 127 Then
                bFound = False
                For K = 1 To Len(sChars)
                    If Mid(sChars, K, 1) = Mid(sTemp, J, 1) Then bFound = True
                Next K
                If Not bFound Then
                    sChars = sChars & Mid(sTemp, J, 1)
                End If
            End If
        Next J
    Next c
    ForeignChars = sChars
End Function

In order to use the function, you could use this in your worksheet:

=ForeignChars(A1:A2500)

This checks the contents of the range designated (A1:A2500). The foreign-language characters are returned as a single string by the function. You'll also find that the macro returns a wider range of foreign characters than the earlier formula because the CODE worksheet function (used in the formula) evaluates text a bit differently than the AscW VBA function (used in the macro).

Note:

If you would like to know how to use the macros described on this page (or on any other page on the ExcelTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (767) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Small Red Dots under Addresses

Does your document have all sorts of different colored underlines on it? Tracking down what most of them are can be easy, ...

Discover More

Delivery Address Won't Print on Envelopes

Word includes a feature that allows you to easily create and print envelopes, based on the addresses you insert in your ...

Discover More

Telling which Worksheets are Selected

If your macro processes information on a number of worksheets, chances are good that you need your macro to figure out ...

Discover More

Excel Smarts for Beginners! Featuring the friendly and trusted For Dummies style, this popular guide shows beginners how to get up and running with Excel while also helping more experienced users get comfortable with the newest features. Check out Excel 2013 For Dummies today!

More ExcelTips (ribbon)

Pulling a Phone Number with a Known First and Last Name

When using an Excel worksheet to store data (such as names and phone numbers), you may need a way to easily look up a ...

Discover More

Deriving Monthly Median Values

When processing huge amounts of data, it can be a challenge to figure out how to derive the aggregate values you need. ...

Discover More

Summing Only Positive Values

If you have a series of values and you want to get a total of just the values that meet a specific criteria, then you ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] (all 7 characters, in the sequence shown) in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is three less than 3?

2024-06-06 15:30:56

J. Woolley

As noted in my first comment below, Excel 2021's UNIQUE function is not case-sensitive. An equivalent using REDUCE, LAMBDA, and VSTACK is case-sensitive, but those functions currently require Excel 365.
My Excel Toolbox now includes the following function to return unique rows or columns from a cell range (contiguous) or array constant or result of an array function:
    =UniquePlus(RangeArray, [LeftRight], [ExactlyOnce], [HasHeader],
        [CaseSensitive])
The first three parameters match Excel's UNIQUE function.
When HasHeader is TRUE, the first row (LeftRight=FALSE) or first column (LeftRight=TRUE) is returned even if it is not unique.
Case-sensitive comparisons are done when CaseSensitive is TRUE.
This function does not require Excel 2021 or newer.
See https://sites.google.com/view/MyExcelToolbox


2024-06-02 16:48:50

J. Woolley

The Tip's ForeignChars function returns an unsorted string of characters; for example
    ëÄäéÉË
The result will be case-sensitive as illustrated in that example if the function is in a module with the default
    Option Compare Binary
But if the module begins with
    Option Compare Text
the result will not be case-sensitive and the previous example would simply return
    ëÄé
The following version returns a sorted case-sensitive comma-separated result for that example like this
    Ä, É, Ë, ä, é, ë
regardless of Option Compare:

Function ForeignChars2(ByVal Target As Range) As String
'ASCII is 0 to 127; non-ASCII is "foreign"
    Dim Chars As New Collection, char As String
    Dim cell As Range, n As Integer, k As Integer
    Const COMP As Integer = vbBinaryCompare 'case-sensitive
    If Target.Cells.Count > 1 Then
        Set Target = Application.Intersect(Target.Parent.UsedRange, Target)
    End If
    For Each cell In Target
        If WorksheetFunction.IsText(cell) Then
            For n = 1 To Len(cell)
                char = Mid(cell, n, 1)
                If AscW(char) > 127 Then 'non-ASCII
                    If Chars.Count = 0 Then
                        Chars.Add char 'add first
                    ElseIf StrComp(char, Chars(Chars.Count), COMP) > 0 Then
                        Chars.Add char 'add char > Chars(Chars.Count)
                    Else
                        For k = 1 To Chars.Count
                            Select Case StrComp(char, Chars(k), COMP)
                            Case 0 'already added char = Chars(k)
                                Exit For
                            Case Is < 0 'add char < Chars(k)
                                Chars.Add char, , k 'before Chars(k)
                                Exit For
                            End Select
                        Next k
                    End If
                End If
            Next n
        End If
    Next cell
    If Chars.Count = 0 Then ForeignChars2 = "none": Exit Function
    ForeignChars2 = Chars(1)
    For k = 2 To Chars.Count
        ForeignChars2 = ForeignChars2 & ", " & Chars(k)
    Next k
End Function


2024-06-01 18:47:19

J. Woolley

The Tip's first formula is very clever:
    =LET(wl, A:A, cwl, TEXTJOIN(, TRUE, wl), cl, MID(cwl, SEQUENCE(LEN(cwl)) ,1), SORT(UNIQUE(FILTER( cl, CODE(cl)>127))))
Here are some nit-picking comments:
1. TEXTJOIN(, TRUE, wl) could be replaced by CONCAT(wl) or CONCAT(A:A) for the same result. Using the latter means the wl parameter is unnecessary:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), SORT(UNIQUE(FILTER( cl, CODE(cl)>127))))
2. The Tip's formula returns a vertical array with N rows and 1 column, where N is the number of unique "foreign-language" (non-ASCII) characters. Lucie might prefer a comma separated list in a single cell:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)) ,1), TEXTJOIN(", ", TRUE, SORT(UNIQUE(FILTER(cl, CODE(cl)>127)))))
3. Unless specifically stated, most Excel functions are not case-sensitive when applied to text. Neither is the UNIQUE function; therefore,
    UNIQUE({"a";"A";"B";"b"}) returns {"a";"B"}
a mixed-case array. Lucie might prefer all lower-case characters for uniform appearance:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)) ,1), TEXTJOIN(", ", TRUE, SORT(UNIQUE(LOWER(FILTER(cl, CODE(cl)>127))))))
4. Here is the same formula with the UNIQUE(...) part isolated for further discussion:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)),1), uniq, UNIQUE(LOWER(FILTER(cl, CODE(cl)>127))), TEXTJOIN(", ", TRUE, SORT(uniq)))
5. Lucie might want to list both upper-case and lower-case characters if both appear in the names. The following article describes a case-sensitive equivalent of Excel's UNIQUE function:
https://exceljet.net/formulas/unique-values-case-sensitive
This version of the previous formula includes each unique case-sensitive "foreign-language" character:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), uniq, REDUCE(, FILTER(cl, CODE(cl)>127), LAMBDA(a, v, IF(SUM(--EXACT(a, v)), a, VSTACK(a, v)))), TEXTJOIN(", ", TRUE, SORT(uniq)))
6. Like UNIQUE, the SORT function is not case-sensitive. But the SortPlus function in My Excel Toolbox has a case-sensitive option:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), uniq, REDUCE(, FILTER(cl, CODE(cl)>127), LAMBDA(a, v, IF(SUM(--EXACT(a, v)), a, VSTACK(a, v)))), TEXTJOIN(", ", TRUE, SortPlus(uniq, , , , , TRUE)))
7. SortPlus also has an option for ANSI/ASCII/Unicode order instead of Excel order; that option sorts upper-case before lower-case:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), uniq, REDUCE(, FILTER(cl, CODE(cl)>127), LAMBDA(a, v, IF(SUM(--EXACT(a, v)), a, VSTACK(a, v)))), TEXTJOIN(", ", TRUE, SortPlus(uniq, , , , , TRUE, TRUE)))
Each of these formulas can be copy/pasted into a worksheet for testing.
For more on SortPlus, see https://excelribbon.tips.net/T012575
and https://sites.google.com/view/MyExcelToolbox


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.