by Allen Wyatt
(last updated July 14, 2018)
Marty has a large worksheet that lists all of the employees (past and present) in his company. The worksheet tracks various information about each employee, such as name, address, department, gender, status, etc. Marty often filters the data to meet his needs. He would like a way to determine the percentage of employees that are male and the percentage that are female, based on which rows are displayed after filtering. The SUBTOTAL function can work on a filtered column to provide various counts, but it won't allow him to determine counts based on the contents ("M" or "F") of the filtered column.
One approach is to use a PivotTable to determine the percentages. PivotTables are relatively easy to use, particularly to answer a single question such as this. They aren't terribly great, however, at allowing you to see detail information on your employees—you can see the aggregate answer to the male/female question, but you cannot at the same time see the detail on those employees. So, I want to focus on using straight formulas in the response to Marty's problem.
Creating a formula to get the desired percentages is more difficult than it at first appears. It is easy, for example, to determine counts when all the employee records are visible. You could, for instance, simply use something like this, assuming that the gender was in column C, to determine what percentage of the records are for male employees:
The problem is if you filter the records by a column different than column C. For instance, you might filter it by whatever is in the status column. The above formula will still give you a percentage based on all records in the gender column, not just those that are currently visible due to the filtering.
You might, at this point, think that the SUBTOTAL function might work, but Marty found out that it wouldn't. Again, if the gender is in column C, you could put the following at the bottom of column C:
This won't work, though, for the desired needs. The primary problem is that the SUBTOTAL function gives you a count of which records are visible, but there is no discernment between whether those records contain "M" or "F" in column C. The secondary issue is that COUNTA counts all records, not just those displayed. Thus, the formula doesn't give you the percentage of the displayed records that contain "M" or "F" but, instead, a percentage of how many records are displayed out of the total number of records.
You could try using a helper column if you wanted to. Just add a column to represent the "M" or "F" status of each employee as either 1 or 0. This could be done with a simple formula, such as this:
This formula assumes, again, that the gender column is C. Assuming that this formula is placed in column X (your helper column), you could then use two SUBTOTAL functions, in this fashion:
The first use of SUBTOTAL gives you the number of rows that contain "M" and the second use of SUBTOTAL gives you the total number of visible rows. The result of the formula is a percentage of male workers whose records are visible. You can determine the percentage of female workers by subtracting the male result from 1.
If you cannot use a helper column (or you prefer to not use a helper column), it is a bit trickier to get around the deficiencies of the approaches previously discussed; it takes a formula that is a bit more complicated. The following is a formula that will work, but I've broken the single formula onto four lines to make it a bit easier to explain.
=SUMPRODUCT(SUBTOTAL(103,OFFSET(C2:C9999, ROW(C2:C9999)-MIN(ROW(C2:C9999)),,1)), ISNUMBER(SEARCH("M",C2:C9999))+0) /SUBTOTAL(103,C2:C9999)
I'm going to go into great detail as to how this formula works, so bear with me for a bit—this will take a while. Let's first take a look at this part of the formula, which spans the end of the first row and the totality of the second row:
The usage of the OFFSET function here relies upon 3 parameters. The first corresponds to a starting point for calculating the reference that will be returned (in this case the starting point is C2:C9999). The second parameter is the number of rows to be offset from the beginning of that range defined in the first parameter. In this case, the rows to be offset is defined by subtracting the lowest row number in the range (which will always return the value 2) from the actual row being analyzed. So, for instance, if the row being analyzed is row 10, then subtracting 2 (the starting row) from that gives us an offset of 8 rows from the beginning of the range specified in the first parameter.
The third parameter is blank, so it defaults to 0. This is the number of columns to be offset from the first column in the range specified in the first parameter. Finally, the fourth parameter is the number 1, which indicates you want OFFSET to return a range that is only 1 cell high.
The bottom line is that this whole portion of the formula is included so that it returns a reference to a single cell in the column being analyzed. For the sake of explanation in this formula, let's call what is returned "SingleCell." Plugging this into the original formula, we get this:
=SUMPRODUCT(SUBTOTAL(103,SingleCell), ISNUMBER(SEARCH("M",C2:C9999))+0) /SUBTOTAL(103,C2:C9999)
The first SUBTOTAL function then returns the COUNTA result (indicated by the value 3 being used for the first parameter) for the single cell. The effect is that SUBTOTAL returns either a 0 or 1, depending on whether the cell is visible or not. (If the cell is filtered out of the results, then 0 is returned. If it is not filtered out—it is visible—then 1 is returned.)
The next part of the formula relies on the ISNUMBER and SEARCH functions. This portion of the code returns either a 0 or 1 depending on whether the cell contains the letter "M" or not. What you end up with, then, is something that boils down to this:
In the case of this single row, SUMPRODUCT would return 0, meaning the row "doesn't count" in the overall count. Since SUMPRODUCT is an array-based function, it is calculating a product based upon multiplication of every row in the original range, individually. Thus, it determines a count of all rows that satisfy the two conditions: the row is visible and the row contains the letter "M."
This is finally divided by the result of the final SUBTOTAL function, which is the COUNTA result of the number of visible rows. The final result is the percentage of how many visible rows have the "M" character visible in column C—the exact result that Marty wanted.
In order to get the percentage of females in the visible rows, all you need to do is to change what is being searched for: change "M" to "F" in the formula, and you should be fine.
There is one caveat that should be mentioned, though. The SEARCH function doesn't differentiate between uppercase and lowercase. Thus, if you use "Male" instead of "M" and "Female" instead of "F" in your gender column (column C), then searching for "Male" in the formula will match every single row because cells that contain "Female" will contain the characters "male". Thus, it is best to stick with "M" and "F" or, if you must use "Male" and "Female," then just use "Female" in the formula and calculate the percentage of male records as equal to 1 minus the female percentage.
ExcelTips is your source for cost-effective Microsoft Excel training. This tip (13550) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, and Excel in Office 365.
Program Successfully in Excel! John Walkenbach's name is synonymous with excellence in deciphering complex technical topics. With this comprehensive guide, "Mr. Spreadsheet" shows how to maximize your Excel experience using professional spreadsheet application development tips from his own personal bookshelf. Check out Excel 2013 Power Programming with VBA today!
An amortization schedule is a report that shows how the outstanding balance on a loan changes with payments made over ...Discover More
If you use serial numbers that include both letters and numbers, you might wonder how you can increment the numeric ...Discover More
Excel includes the powerful INDIRECT function which can be used to assemble references to other cells in your workbook. ...Discover More
FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.