Extracting Foreign-Language Characters

Written by Allen Wyatt (last updated June 1, 2024)
This tip applies to Excel 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021


3

Lucie has imported into a workbook data from her company's HR/payroll system that she needs to forward to a third-party vendor. Before she forwards it, however, she needs to compile a list of foreign-language characters contained in the employee names, such as umlauts and accents. She doesn't need to replace the characters; she just needs a list of them.

There are a couple of ways to do this. First, if you are using Excel 2021 or the version with Microsoft 365, you can put together a powerful formula that will pull the desired characters:

=LET(wl,A:A,cwl,TEXTJOIN(,TRUE,wl),cl,MID(cwl,SEQUENCE(LEN(cwl)),1),
SORT(UNIQUE(FILTER(cl,CODE(cl)>127))))

Even though the formula is shown here in two lines, it is a single formula. It assumes that all of the employee names are in column A. You can place this formula in a cell in a different column, and as long as there is nothing in the cells below the formula, you'll have a sorted list of characters returned.

The formula works by concatenating all the names in column A into a single string (assigned to the variable cwl) and then examining each character in the string. If the character has a code value of 128 or greater, then it is considered a foreign character. (Those with codes below 128 are assumed to be non-foreign characters.)

The formula returns only unique characters, and those are sorted. Because the formula concatenates all the names (using the TEXTJOIN function), there is a limit on what it can process. If the combined length of all the characters is 32,767 or more, then a #CALC error is returned.

If you are not using the latest versions of Excel or you might have more than 32,767 characters you are examining, then you should consider using a macro. Here's an example of one that will go through all of the cells in a range and return the foreign-language characters:

Function ForeignChars(ByVal MyRange As Range) As String
    'All characters from 0 to 127 are considered non-foreign
    'All characters above 127 are considered foreign

    Dim c As Range
    Dim sTemp As String
    Dim sChars As String
    Dim J As Integer
    Dim K As Integer
    Dim bFound As Boolean

    Application.Volatile
    sChars = ""
    For Each c In MyRange
        sTemp = c.Text
        For J = 1 To Len(sTemp)
            If AscW(Mid(sTemp, J, 1)) > 127 Then
                bFound = False
                For K = 1 To Len(sChars)
                    If Mid(sChars, K, 1) = Mid(sTemp, J, 1) Then bFound = True
                Next K
                If Not bFound Then
                    sChars = sChars & Mid(sTemp, J, 1)
                End If
            End If
        Next J
    Next c
    ForeignChars = sChars
End Function

In order to use the function, you could use this in your worksheet:

=ForeignChars(A1:A2500)

This checks the contents of the range designated (A1:A2500). The foreign-language characters are returned as a single string by the function. You'll also find that the macro returns a wider range of foreign characters than the earlier formula because the CODE worksheet function (used in the formula) evaluates text a bit differently than the AscW VBA function (used in the macro).

Note:

If you would like to know how to use the macros described on this page (or on any other page on the ExcelTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (767) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, Excel in Microsoft 365, and 2021.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Changing the Color Used to Denote Selected Cells

When entering data into a range of cells, the cell in which you are working appears in a different color than the other ...

Discover More

Dictionary Shortcut Key

Need a quick way to display the dictionary or other grammar tools? Use one of the handy built-in shortcuts provided by Word.

Discover More

Getting Rid of Trailing Spaces in Table Cells

If you have tables in your document, there may be trailing spaces within the cells of those tables. This tip provides a ...

Discover More

Solve Real Business Problems Master business modeling and analysis techniques with Excel and transform data into bottom-line results. This hands-on, scenario-focused guide shows you how to use the latest Excel tools to integrate data from multiple tables. Check out Microsoft Excel 2013 Data Analysis and Business Modeling today!

More ExcelTips (ribbon)

Totaling Across Worksheets

Want to sum the values in the same cell on a range of worksheets? It's not as easy as summing a range on the same ...

Discover More

Returning a Weight and a Date

If you have two columns containing dates and weights from those dates, you may want to pick a date associated with a ...

Discover More

Calculating Monthly Interest Charges

Trying to calculate how much people owe you? If you charge interest or service charges on past-due accounts, there are a ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] (all 7 characters, in the sequence shown) in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 2 + 2?

2024-06-06 15:30:56

J. Woolley

As noted in my first comment below, Excel 2021's UNIQUE function is not case-sensitive. An equivalent using REDUCE, LAMBDA, and VSTACK is case-sensitive, but those functions currently require Excel 365.
My Excel Toolbox now includes the following function to return unique rows or columns from a cell range (contiguous) or array constant or result of an array function:
    =UniquePlus(RangeArray, [LeftRight], [ExactlyOnce], [HasHeader],
        [CaseSensitive])
The first three parameters match Excel's UNIQUE function.
When HasHeader is TRUE, the first row (LeftRight=FALSE) or first column (LeftRight=TRUE) is returned even if it is not unique.
Case-sensitive comparisons are done when CaseSensitive is TRUE.
This function does not require Excel 2021 or newer.
See https://sites.google.com/view/MyExcelToolbox


2024-06-02 16:48:50

J. Woolley

The Tip's ForeignChars function returns an unsorted string of characters; for example
    ëÄäéÉË
The result will be case-sensitive as illustrated in that example if the function is in a module with the default
    Option Compare Binary
But if the module begins with
    Option Compare Text
the result will not be case-sensitive and the previous example would simply return
    ëÄé
The following version returns a sorted case-sensitive comma-separated result for that example like this
    Ä, É, Ë, ä, é, ë
regardless of Option Compare:

Function ForeignChars2(ByVal Target As Range) As String
'ASCII is 0 to 127; non-ASCII is "foreign"
    Dim Chars As New Collection, char As String
    Dim cell As Range, n As Integer, k As Integer
    Const COMP As Integer = vbBinaryCompare 'case-sensitive
    If Target.Cells.Count > 1 Then
        Set Target = Application.Intersect(Target.Parent.UsedRange, Target)
    End If
    For Each cell In Target
        If WorksheetFunction.IsText(cell) Then
            For n = 1 To Len(cell)
                char = Mid(cell, n, 1)
                If AscW(char) > 127 Then 'non-ASCII
                    If Chars.Count = 0 Then
                        Chars.Add char 'add first
                    ElseIf StrComp(char, Chars(Chars.Count), COMP) > 0 Then
                        Chars.Add char 'add char > Chars(Chars.Count)
                    Else
                        For k = 1 To Chars.Count
                            Select Case StrComp(char, Chars(k), COMP)
                            Case 0 'already added char = Chars(k)
                                Exit For
                            Case Is < 0 'add char < Chars(k)
                                Chars.Add char, , k 'before Chars(k)
                                Exit For
                            End Select
                        Next k
                    End If
                End If
            Next n
        End If
    Next cell
    If Chars.Count = 0 Then ForeignChars2 = "none": Exit Function
    ForeignChars2 = Chars(1)
    For k = 2 To Chars.Count
        ForeignChars2 = ForeignChars2 & ", " & Chars(k)
    Next k
End Function


2024-06-01 18:47:19

J. Woolley

The Tip's first formula is very clever:
    =LET(wl, A:A, cwl, TEXTJOIN(, TRUE, wl), cl, MID(cwl, SEQUENCE(LEN(cwl)) ,1), SORT(UNIQUE(FILTER( cl, CODE(cl)>127))))
Here are some nit-picking comments:
1. TEXTJOIN(, TRUE, wl) could be replaced by CONCAT(wl) or CONCAT(A:A) for the same result. Using the latter means the wl parameter is unnecessary:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), SORT(UNIQUE(FILTER( cl, CODE(cl)>127))))
2. The Tip's formula returns a vertical array with N rows and 1 column, where N is the number of unique "foreign-language" (non-ASCII) characters. Lucie might prefer a comma separated list in a single cell:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)) ,1), TEXTJOIN(", ", TRUE, SORT(UNIQUE(FILTER(cl, CODE(cl)>127)))))
3. Unless specifically stated, most Excel functions are not case-sensitive when applied to text. Neither is the UNIQUE function; therefore,
    UNIQUE({"a";"A";"B";"b"}) returns {"a";"B"}
a mixed-case array. Lucie might prefer all lower-case characters for uniform appearance:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)) ,1), TEXTJOIN(", ", TRUE, SORT(UNIQUE(LOWER(FILTER(cl, CODE(cl)>127))))))
4. Here is the same formula with the UNIQUE(...) part isolated for further discussion:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)),1), uniq, UNIQUE(LOWER(FILTER(cl, CODE(cl)>127))), TEXTJOIN(", ", TRUE, SORT(uniq)))
5. Lucie might want to list both upper-case and lower-case characters if both appear in the names. The following article describes a case-sensitive equivalent of Excel's UNIQUE function:
https://exceljet.net/formulas/unique-values-case-sensitive
This version of the previous formula includes each unique case-sensitive "foreign-language" character:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), uniq, REDUCE(, FILTER(cl, CODE(cl)>127), LAMBDA(a, v, IF(SUM(--EXACT(a, v)), a, VSTACK(a, v)))), TEXTJOIN(", ", TRUE, SORT(uniq)))
6. Like UNIQUE, the SORT function is not case-sensitive. But the SortPlus function in My Excel Toolbox has a case-sensitive option:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), uniq, REDUCE(, FILTER(cl, CODE(cl)>127), LAMBDA(a, v, IF(SUM(--EXACT(a, v)), a, VSTACK(a, v)))), TEXTJOIN(", ", TRUE, SortPlus(uniq, , , , , TRUE)))
7. SortPlus also has an option for ANSI/ASCII/Unicode order instead of Excel order; that option sorts upper-case before lower-case:
    =LET(cwl, CONCAT(A:A), cl, MID(cwl, SEQUENCE(LEN(cwl)), 1), uniq, REDUCE(, FILTER(cl, CODE(cl)>127), LAMBDA(a, v, IF(SUM(--EXACT(a, v)), a, VSTACK(a, v)))), TEXTJOIN(", ", TRUE, SortPlus(uniq, , , , , TRUE, TRUE)))
Each of these formulas can be copy/pasted into a worksheet for testing.
For more on SortPlus, see https://excelribbon.tips.net/T012575
and https://sites.google.com/view/MyExcelToolbox


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.