Written by Allen Wyatt (last updated January 3, 2026)
This tip applies to Excel 2007, 2010, 2013, 2016, 2019, 2021, 2024, and Excel in Microsoft 365
Ella has a worksheet that, in column A, contains a huge number of text values. She would like to determine the top 10 words that are in the column, along with the count for each of those words. A word may be the only thing in a cell, or a cell could contain multiple words, similar to a sentence. This makes it unclear for Ella as to how she should approach the task.
There are a couple of ways you can approach the task. First, if you are using Excel 365 you can use a formula to generate the frequency count:
=LET(
src, LOWER(TRIMRANGE(A:A)),
w, TOCOL(
TEXTSPLIT(TEXTJOIN(" ", 1, src),, " "),
3),
u, UNIQUE(w),
TAKE(
SORT(
HSTACK(u, MAP(u, LAMBDA(x, SUM(--(w = x))))),
2, -1),
10))
I've tried to split out the lines in this formula so it is easier to understand, but it is still a single formula. The src variable refers to the lowercase version of the cells containing data in column A, the w variable is an array of just the words in that column, and the u variable is the unique words. The portion of the formula beginning with the TAKE function sorts the words by frequency in descending order and displays just the first 10 of the words.
For simple lists, this will work fine, but you need to be aware that if there are punctuation marks following words, then they are considered part of the words they follow. Thus, "abc" (without a comma following it) is different than "abc," (with a comma) or "abc." (with a period).
The bigger issue, though, is that the TEXTJOIN function, used to combine all the source cells into a single string, has an upper limit of 32,767 characters. Thus, if Ella's list in column A is as huge as she indicates, the result of the formula may be an error. The best way to deal with this potentiality—as well as to get a result in older versions of Excel—is to rely on a macro to do the analysis. The following macro provides an example:
Sub Top10Words()
Dim dict As New Collection
Dim rSource As Range
Dim cell As Range
Dim sIgnore As Variant
Dim sIg As Variant
Dim bIgnore As Boolean
Dim words As Variant
Dim w As Variant
Dim i As Long
Dim j As Long
Dim arr() As Variant
Dim cutoff As Long
Dim s As String
Dim sTemp As String
Dim c As String
Dim v As Variant
Dim t1 As Variant
Dim t2 As Variant
' Words to ignore
sIgnore = Array("the", "and", "of", "to", "a", "in", "for", "on", _
"at", "with", "is", "it", "this", "that", "as", "be", "by", "an", _
"or", "from", "was", "were", "are", "but", "not", "so", "if", _
"then", "than", "too", "can", "could", "would", "should", "has", _
"have", "had", "do", "does", "did", "will", "shall", "may", _
"might", "must", "also", "just", "about", "into", "out", "up", _
"down", "over", "under", "again", "still", "only")
' Define range in column A
Set rSource = Range("A1", Cells(Rows.Count, "A").End(xlUp))
For Each cell In rSource
If Len(cell.Value) > 0 Then
words = Split(LCase(CStr(cell.Value)), " ")
For Each w In words
s = CStr(w)
' Keep letters only
sTemp = vbNullString
For i = 1 To Len(s)
c = Mid(s, i, 1)
If c >= "a" And c <= "z" Then sTemp = sTemp & c
Next i
If Len(sTemp) > 0 Then
' Check to see if word should be ignored
bIgnore = False
For Each sIg In sIgnore
If sTemp = sIg Then
bIgnore = True
Exit For
End If
Next sIg
If Not bIgnore Then
' Check if word is already in list
On Error Resume Next
v = dict.Item(sTemp)
If Err.Number = 0 Then
' Word is in list, so increment count
Err.Clear
dict.Remove sTemp
v(1) = CLng(v(1)) + 1
dict.Add v, sTemp
Else
' Word not in list, so add it
Err.Clear
dict.Add Array(sTemp, 1), sTemp
End If
On Error GoTo 0
End If
End If
Next w
End If
Next cell
' Add headers
Range("C1:D1").Value = Array("Word", "Count")
If dict.Count > 0 Then
' Move collection items to array (word, count)
ReDim arr(1 To dict.Count, 1 To 2)
For i = 1 To dict.Count
arr(i, 1) = dict(i)(0)
arr(i, 2) = CLng(dict(i)(1))
Next i
' Sort array by count descending
For i = 1 To UBound(arr, 1)
For j = i + 1 To UBound(arr, 1)
If arr(j, 2) > arr(i, 2) Then
t1 = arr(i, 1): t2 = arr(i, 2)
arr(i, 1) = arr(j, 1): arr(i, 2) = arr(j, 2)
arr(j, 1) = t1: arr(j, 2) = t2
End If
Next j
Next i
' Output Top 10
cutoff = 10
If UBound(arr, 1) < 10 Then cutoff = UBound(arr, 1)
For i = 1 To cutoff
Range("C" & i + 1).Value = arr(i, 1)
Range("D" & i + 1).Value = arr(i, 2)
Next i
End If
End Sub
When you run the macro, it puts the words and frequency counts into columns C and D. It also allows you to specify as many words as you'd like that can be ignored. (It is common to ignore words such as "a" or "an.") The macro steps through each cell in column A and each word in those cells. The word is converted to lowercase, has any non-letter characters removed, and is checked against the words you want ignored. Assuming that the word still passes that test, it is compared against the dict collection which is used to aggregate the words.
Once done with all the cells, the macro moves the collection into an array, sorts the array, and then outputs just the top 10 words based on frequency.
Note:
ExcelTips is your source for cost-effective Microsoft Excel training. This tip (10193) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, 2021, 2024, and Excel in Microsoft 365.
Dive Deep into Macros! Make Excel do things you thought were impossible, discover techniques you won't find anywhere else, and create powerful automated reports. Bill Jelen and Tracy Syrstad help you instantly visualize information to make it actionable. You’ll find step-by-step instructions, real-world case studies, and 50 workbooks packed with examples and solutions. Check out Microsoft Excel 2019 VBA and Macros today!
If you have a series of values in a column, you might have a need to separate the values into even values and odd values. ...
Discover MoreIf you need to find out how many columns are set to be a specific width, you'll need a macro to help determine the info. ...
Discover MoreWrite out a check and you need to include the digits for the amount of the check and the value of the check written out ...
Discover MoreFREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
2026-01-12 14:30:17
J. Woolley
My earlier comments below describe RegExp in Function Top10Words2. RegExp defines \w as a word metacharacter equivalent to [A-Za-z0-9_]; however, English dictionaries say words are built from letters, hyphen, and apostrophe, or [A-Za-z-']. Therefore, the following statement in Top10Words2
    RegEx.Pattern = "(\w+)" '\w is the same as [A-Za-z0-9_]
should be replaced by
    RegEx.Pattern = "([A-Za-z-']+)"
For related discussion, see my recent comment here:
https://excelribbon.tips.net/T011792_Returning_the_Final_Word.html
2026-01-11 15:06:23
J. Woolley
Re. my most recent comment below, here's another way to handle ties. In the original version of Function Top10Words2, replace these statements
    For n = 1 To k
        Result(n, 1) = Words(n - 1) 'unique word (case ignored)
        Result(n, 2) = Uniq.Item(Words(n - 1)) 'unique word count
    Next n
    Top10Words2 = Result
with the following statements:
    Dim j As Long 'initially zero
    For n = 1 To k
        If j < Uniq.Count Then
            Result(n, 1) = Uniq.Item(Words(j)) 'unique word count
            Result(n, 2) = Words(j) 'unique word (case ignored)
            j = j + 1
            Do While j < Uniq.Count
                If Uniq.Item(Words(j)) <> Result(n, 1) Then Exit Do
                Result(n, 2) = Result(n, 2) & ", " & Words(j)
                j = j + 1
            Loop
        Else
            Result(n, 1) = vbNullString
            Result(n, 2) = vbNullString
        End If
    Next n
    Top10Words2 = Result
This change combines all the words with equal frequency count into a comma separated group. Notice frequency count is now in the 1st column and words are in the 2nd. The 1st column will no longer contain duplicate count values. You might want to format the 2nd column as Wrap Text.
2026-01-09 12:03:09
J. Woolley
One issue for a list of the top 10 words is how to handle ties; i.e., what if there are more words with the same frequency as the 10th word in the list. Here's a solution for the Top10Words2 function described in my previous two comments below. Replace the final statement
    Top10Words2 = Result
with the following statements:
    If Uniq.Count > 10 Then 'add words with tied Uniq.Item
        For n = k + 1 To Uniq.Count
            If Uniq.Item(Words(n - 1)) < Result(k, 2) Then Exit For
            Result(k, 1) = Result(k, 1) & ", " & Words(n - 1)
        Next n
    End If
    Top10Words2 = Result
This change combines the 10th word in the list with a comma separated group of additional words that have the same frequency. You might want to format that cell as Wrap Text.
2026-01-08 15:46:07
J. Woolley
Sorry. I forgot to format the function. Here's a prettier version:
Function Top10Words2(Target As Variant) As Variant
    'Early binding requires VBE > Tools > References: _
        Microsoft Scripting Runtime _
        Microsoft VBScript Regular Expressions
    Dim Uniq As New Dictionary, Ignor As New Dictionary
    Dim RegEx As New RegExp, expr As Match, n As Long, k As Long
    Dim cell As Variant, word As Variant, Words As Variant, Result As Variant
    Const SKIP As String = "the,and,of,to,a,in,for,on,at,with,is,it,this," _
        & "that,as,be,by,an,or,from,was,were,are,but,not,so,if,then,than," _
        & "too,can,could,would,should,has,have,had,do,does,did,will,shall," _
        & "may,might,must,also,just,about,into,out,up,down,over,under," _
        & "again,still,only" 'consider adjusting this list
    For Each word In Split(SKIP, ",")
        Ignor.Add Key:=word, Item:=Null
    Next word
    RegEx.Global = True
    RegEx.Multiline = True
    RegEx.IgnoreCase = True
    RegEx.Pattern = "(\w+)" '\w is the same as [A-Za-z0-9_]
    For Each cell In Target
        If RegEx.Test(cell) Then
            For Each expr In RegEx.Execute(cell)
                word = LCase(CStr(expr)) 'ignore case
                If Uniq.Exists(word) Then
                    Uniq.Item(word) = Uniq.Item(word) + 1
                ElseIf Not Ignor.Exists(word) Then
                    Uniq.Add Key:=word, Item:=1
                End If
            Next expr
        End If
    Next cell
    Words = Uniq.Keys 'base-0 array
    For n = 0 To UBound(Words) - 1 'bubble-sort Uniq.Items descending
        For k = n + 1 To UBound(Words)
            If Uniq.Item(Words(k)) > Uniq.Item(Words(n)) Then
                word = Words(k)
                Words(k) = Words(n)
                Words(n) = word
            End If
        Next k
    Next n
    k = Uniq.Count
    If k > 10 Then k = 10
    ReDim Result(1 To k, 1 To 2) 'k rows, 2 columns
    For n = 1 To k
        Result(n, 1) = Words(n - 1) 'unique word (case ignored)
        Result(n, 2) = Uniq.Item(Words(n - 1)) 'unique word count
    Next n
    Top10Words2 = Result
End Function
2026-01-08 15:38:56
J. Woolley
Here's an alternate version of the Tip's macro as a user-defined function (UDF) that returns a dynamic array with 10 rows (or less) and 2 columns (word and count). Dictionary and RegExp are more efficient than Collection and Mid. Early binding requires using VB Editor's Tools menu to Reference both Microsoft Scripting Runtime and Microsoft VBScript Regular Expressions libraries. Target can be a cell range (contiguous or not) or an array. If there are no words in Target, #VALUE! error is returned. Like the Tip's macro, text is converted to lower case. This function counts words containing letters (a-z), numbers (0-9), and underscore(_); any other character is a word separator. But the Tip's macro separates words by the space character and only considers words containing letters (a-z); non-letters are simply eliminated from a word, so a hyphenated word like non-letter becomes nonletter and a word like Top10Words2 becomes topwords.
Function Top10Words2(Target As Variant) As Variant
'Early binding requires VBE > Tools > References: _
Microsoft Scripting Runtime _
Microsoft VBScript Regular Expressions
Dim Uniq As New Dictionary, Ignor As New Dictionary
Dim RegEx As New RegExp, expr As Match, n As Long, k As Long
Dim cell As Variant, word As Variant, Words As Variant, Result As Variant
Const SKIP As String = "the,and,of,to,a,in,for,on,at,with,is,it,this," _
& "that,as,be,by,an,or,from,was,were,are,but,not,so,if,then,than," _
& "too,can,could,would,should,has,have,had,do,does,did,will,shall," _
& "may,might,must,also,just,about,into,out,up,down,over,under," _
& "again,still,only" 'consider adjusting this list
For Each word In Split(SKIP, ",")
Ignor.Add Key:=word, Item:=Null
Next word
RegEx.Global = True
RegEx.Multiline = True
RegEx.IgnoreCase = True
RegEx.Pattern = "(\w+)" '\w is the same as [A-Za-z0-9_]
For Each cell In Target
If RegEx.Test(cell) Then
For Each expr In RegEx.Execute(cell)
word = LCase(CStr(expr)) 'ignore case
If Uniq.Exists(word) Then
Uniq.Item(word) = Uniq.Item(word) + 1
ElseIf Not Ignor.Exists(word) Then
Uniq.Add Key:=word, Item:=1
End If
Next expr
End If
Next cell
Words = Uniq.Keys 'base-0 array
For n = 0 To UBound(Words) - 1 'bubble-sort Uniq.Items descending
For k = n + 1 To UBound(Words)
If Uniq.Item(Words(k)) > Uniq.Item(Words(n)) Then
word = Words(k)
Words(k) = Words(n)
Words(n) = word
End If
Next k
Next n
k = Uniq.Count
If k > 10 Then k = 10
ReDim Result(1 To k, 1 To 2) 'k rows, 2 columns
For n = 1 To k
Result(n, 1) = Words(n - 1) 'unique word (case ignored)
Result(n, 2) = Uniq.Item(Words(n - 1)) 'unique word count
Next n
Top10Words2 = Result
End Function
If your version is earlier than Excel 2021, select an empty 10x2 range and use Ctrl+Shift+Enter for a CSE array formula like {=Top10Words2(A1:A999)}.
Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.
FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
Copyright © 2026 Sharon Parq Associates, Inc.
Comments