Please Note: This article is written for users of the following Microsoft Excel versions: 2007, 2010, and 2013. If you are using an earlier version (Excel 2003 or earlier), this tip may not work for you. For a version of this tip written specifically for earlier versions of Excel, click here: Deriving Monthly Median Values.

Deriving Monthly Median Values

by Allen Wyatt
(last updated January 27, 2017)

4

John has a huge worksheet that includes daily data covering about sixty years. He wants to come up with a formula that will calculate the median values for each month in the time period covered by the data.

Before proposing a solution, it is best to lay out some assumptions. For the purposes of this tip, let's assume that the daily data is in columns A and B. In column A are the dates associated with the data and in column B are the actual data values for each of those dates. Further, cells A1 and B1 contain headings for each column. This means that the actual data is approximately in the range of A2:B22000.

To make your formulas easier to use, you should define names for the data in both columns A and B. Select the range of dates (for example, A2:A22000) and assign it a name such as AllDates. Select the range of corresponding data (for example, B2:B22000) and use the same process to create a name such as AllData.

You can use array formulas to calculate the actual median values. This involves setting up another data table to contain the medians. Place headings such as "Month" in cell E1 and "Median" in cell F1. In cell E2 place the first day of the first month of your data set, such as 1/1/1940. In cell E3 put a date that is one month later, such as 2/1/1940. Select these two cells (E2:E3) and drag the fill handle downward for the number of months you want in the data table.

If there are no blanks in your sixty years of data, then enter the following formula into cell F2:

=MEDIAN(IF(DATE(YEAR(AllDates),MONTH(AllDates),1)=E2,AllData))

Finalize the formula by pressing Ctrl+Shift+Enter, which tells Excel that this is an array formula. You can then copy the formula in F2 into each of the cells in column F that has a corresponding month in column E. The formula analyzes the dates in column B, and if the year and month are equal to whatever date you put in cell E2, then the median is calculated from all corresponding data points.

If there are blanks in your sixty years of data (a date in column A with no corresponding value in column B), then the formula treats the blank as if it is a zero value. If you do have blanks, this may result in skewed medians. To get around this, you could use a different array formula that checks for and ignores any blank values:

=MEDIAN(IF((DATE(YEAR(AllDates),MONTH(AllDates),1)=E2)*ISNUMBER(AllData),AllData))

There is one caveat to using array formulas in this manner. If you have sixty years of data, with approximately 22,000 individual values, then that is still a lot of months: about 720 of them. That means that you are creating 720 array formulas, each analyzing 22,000 data values to arrive at an answer. That is a lot of calculating going on, so you will notice a slow-down in the responsiveness of Excel whenever it recalculates the worksheet.

If the sluggishness becomes an issue, then you could look at reorganizing your original data so that each row of the worksheet represents an individual month. Column A could contain the month for the row (1/1/1940, 2/1/1940, 3/1/1940, etc.) and the columns B:AF would be days 1 through 31 for each month. The intersecting cells in the table could then contain the data point for each day in the month, and you could use the MEDIAN function in column AG to calculate the median for each month. This still results in 720 formulas, but these are regular formulas that each only need to process a month's worth of data in stead of the array formulas that need to each process sixty years of data. The result is much faster calculations.

Of course, for most people the idea of reorganizing such a huge amount of data is enough to keep you awake at night. Instead, you can take an entirely different approach to analyzing the data. This approach is possible because a median is a very easy statistical function to manually calculate. You simply sort your data set and, if the number of items in the data set is odd, select the middle item. If the number of items is even, then you take the average of the two middle items.

To get ready to analyze the data, there are a couple of things to do. First, it will be handy to have some way to uniquely identify the month of each data point. In cell C2 place the following formula:

=100*Year(A2)+Month(A2) 

This results in a values such as 194001, 194002, 194003, etc. being stored in column C. This is the unique month value. Now, you should sort the data by column C and then by column B. Both sorts should be in ascending order, so that you end up with your data first sorted by year/month and then by value within the year/month.

Next you need to add subtotals to your data. Choose Subtotals from the Data menu, which displays the Subtotal dialog box. You want to add a subtotal at each change in column C. The function to use is Count, and you want to add the subtotal to column B. When you click OK, you end up with 720 subtotals, one for each month in the data range, each one showing a count of how many data items there were in that month.

To get the medians, add a formula to cell D2:

IF(RIGHT(B2,5)="Count", IF(MOD(C2,2)=1, INDIRECT("B"&(ROW()-1)-C2/2+1/2), (INDIRECT("B"&(ROW()-1)-C2/2)+INDIRECT("B"&(ROW()-1)-C2/2+1))/2), "")

The formula examines what is in cell B2, and if it contains the word "Count," then it knows that this is a subtotal row. In that case, it checks to see whether the count in cell C2 is odd or even. If it is odd, then the INDIRECT function is used to grab whatever the median value is for the month. If it is even, then the two middle values for the month are added together and divided in half.

The result is that you now have median values for each month, in the same row as your subtotals. Thus, you can collapse the outline for the data (click the plus signs in the outline area at the left of your data) so that you only show those subtotal rows.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (12727) applies to Microsoft Excel 2007, 2010, and 2013. You can find a version of this tip for the older menu interface of Excel here: Deriving Monthly Median Values.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He  is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Changing Column Width

Do you use columns in your document layout? You may want to modify the widths of various columns, and Word makes the change ...

Discover More

Canceling an Edit

When editing a cell, you may want to cancel the edit at some point. There are two ways to do this, both described in this ...

Discover More

A Shortcut for Switching Focus

While not technically an Excel-only tip, the shortcuts described in this tip will help you switch focus from your workbook to ...

Discover More

Save Time and Supercharge Excel! Automate virtually any routine task and save yourself hours, days, maybe even weeks. Then, learn how to make Excel do things you thought were simply impossible! Mastering advanced Excel macros has never been easier. Check out Excel 2010 VBA and Macros today!

More ExcelTips (ribbon)

Rounding To the Nearest Even Integer

Do you need your numbers to be rounded to an even integer value? How you accomplish the task depends on the nature of the ...

Discover More

Totaling Across Worksheets

Want to sum the values in the same cell on a range of worksheets? It's not as easy as summing a range on the same worksheet, ...

Discover More

Referencing the Last Six Items in a Formula

If you have a list of data in a column, you may want to determine an average of whatever the last few items are in the ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 8Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 7 - 0?

2017-01-27 09:22:27

Chuck Trese

If the first solution given in this tip (the MEDIAN(....)) formula) works and the only problem is sluggish Excel, that is easily solved.
Since this is historical data (i.e., not changing) then you can just replace the complex formulas with their results. To do this, copy the results (the cells with the formulas), and then use Paste Special - Values Only, to overwrite the formulas with the results. Now that the formulas are overwritten by their results, Excel is no longer recalculating those formulas, and is free to do whatever else you want it to do.


2013-11-19 08:39:39

barouh

I met similar task - calculation of medians for subsets of data - quite often. E.g., median growth rates for subcategories within the sample of 60k websites

The additional complexity - comparing to the case described in the tip - is that the number of entries in each category is different and sometimes unknown in advance
My usual answer to such tasks was either the calculation of MEDIAN for formula-defined range (via INDIRECT+CONCATENATE construction) or - if it was unsorted massive - to calculate the rank (or percentile) of the entry within the subcategory it's a part of and then find the value that corresponds to 'middle' rank


2013-11-19 08:31:43

barouh

Calendar can be done not instead but in addition to raw data. The easiest way is to add Year, Month and Day columns to initial sheet and then build Pivot table, that will look like the calendar

Final step - calculation of median for each row of Pivot - is very easy


2013-11-18 08:40:56

Bryan

In regards to restructuring your data: no, no, no, no, NO! By formatting your data like a calendar you ruin so many other functions. For example, what if you wanted to graph it? Can't do it anymore. This should never have been mentioned, because now someone will think it is the proper solution.

Really, if you are having problems due to large amounts of data, you need to think if Excel is still the best data store. Access, for example, would be able to handle this amount of data with no issue (well, the only issue is that there's no median function in SQL, but you could get around it by writing your own in VBA).


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.