# Calculating Statistical Values on Different-Sized Subsets of Data

by Allen Wyatt
(last updated January 24, 2018)

Chris has a huge amount of data in a worksheet and he wants to analyze the data based on different groupings within it. For instance, he has data in cells A2:B36001, where row 1 contains the column headings Time and Signal. He wants to divide the data into groups consisting of some arbitrary number of sequential values, and then extract, for each group, a mean value for the Time, a mean value for the Signal, and a standard deviation for the Signal.

The easiest way to handle this type of requirement is to add a column that is used to indicate a group number for each row. Follow these steps:

1. Put the heading Group into cell C1.
2. In cell E1 enter the number of values that should be in each group. For instance, if you want each group to contain 10 sequential values, enter the number 10 in cell E1.
3. In cell C2 enter this formula: =INT((ROW()-ROW(\$C\$2))/\$E\$1)+1
4. Copy the formula in cell C2 to the range C3:C36001. Column C now contains a "group number" for each row, based on the value in cell E1. If E1 is 10, you end up with 3600 groups, 1 through 3600. If E1 is 100, you end up with 360 groups, 1 through 360.

With the group numbers set up, you are ready to do the analysis. There are a couple of ways you can do this. One way is to use the subtotaling capabilities of Excel. Select one of the cells in the data area and follow these steps:

1. Choose Subtotals from the Data menu. Excel displays the Subtotal dialog box.
2. Change the At Every Change In drop-down list to Group.
3. Change the Use Function drop-down list to indicate the type of statistic you want to calculate for each group.
4. Change the Add Subtotal To area so that only Time or Signal are selected, as appropriate.
5. Click OK.

Excel groups and subtotals the data, as directed. (This process may take a while depending on the size of your groups.) You can hide the detail and only show the subtotals by clicking on the small 2 (with the box around it) in the outline area at the left of the worksheet. If you later want to change what is calculated, or if you need to change the number of items in each group, just remove the subtotals (using the button in the Subtotal dialog box) and repeat the above steps.

Another way to derive the statistics from your data is to use a PivotTable. Make sure that there are no subtotals in the data and select a cell within the data. Then follow these steps:

1. Display the Insert tab of the ribbon.
2. Click the PivotTable tool. (This tool is the first one at the left of the Insert tab.) Excel displays the Create PivotTable dialog box.
3. Click OK. (The default options in the dialog box are just fine.) Excel creates a blank PivotTable and displays a field list at the right of the worksheet.
4. Drag the Group field to the Row Labels area, just below the field list.
5. Drag the Time field to the Values area, just below the field list.
6. Drag the Signal field to the Values area, just below the field list.
7. Drag the Signal field, once again, to the Values area. The PivotTable should now show "Count of Time," "Sum of Signal," and "Sum of Signal2."
8. In the Values area, click the "Count of Time" label. Excel displays a Context menu.
9. Choose Value Field Settings. Excel displays the Value Field Settings dialog box.
10. In the Summarize Value Field By list, choose Average.
11. Click OK. The "Count of Time" labels change to "Average of Time."
12. In the Values area, click the "Sum of Signal" label. Excel displays a Context menu.
13. Choose Value Field Settings. Excel displays the Value Field Settings dialog box.
14. In the Summarize Value Field By list, choose Average.
15. Click OK. The "Sum of Signal" labels change to "Average of Signal."
16. In the Values area, click the "Sum of Signal2" label. Excel displays a Context menu.
17. Choose Value Field Settings. Excel displays the Value Field Settings dialog box.
18. In the Summarize Value Field By list, choose StdDev.
19. Click OK. The "Sum of Signal" labels change to "StdDev of Signal."

You now how the data desired. If you need to change the number of data items in each group, just go back to the data worksheet and change cell E1 to a different value. You can then return to the PivotTable, display the Options tab of the ribbon, and click the Refresh button.

2018-01-26 11:02:28

Peter Atherton

John

I used a couple of formulas to get the data based on rand()

The formulas in D:E are:

Cell ref Formula
D2 =A2
D3 =D2+TIME(0,0,1)
E2 =COUNTIFS(\$A\$2:\$A\$20000,">="&D2,\$A\$2:\$A\$20000,"<="&D3)
F2 =(SUMIFS(\$B\$2:\$B\$20000,\$A\$2:\$A\$20000,">="&D2,\$A\$2:\$A\$20000,"<="&\$D3))/E2

(see Figure 1 below)

Figure 1.

2018-01-24 12:31:01

John

I have a similar problem except the number of values in each grouping varies and I have only 2 columns of data to work with.
I have about 20,000 rows of data. One of the 2 columns I am interested in is time (mm:ssss) which increments approximately 1 second every 4 to 14 rows. The other column is feet per second (fps) at the time in the time column and is very "noisy". To reduce the noise of the fps data I need to average the fps data over nearly one full second to reduce the "noise" in the instantaneous value.

What I need to do is key on the time column and when the truncated difference between row (n+x) - row( n) is >/= to 1, then average the fps column values of rows (n) through (n+x) and put the average in a new column at the (n+x) row after dividing it by the m.ssss in the differenced time column.
This gives me a column with the average fps value over the previous second which is a much less "noisy" value than the instantaneous values.

Doing this manually would take a couple of weeks.

2014-04-14 02:22:26

barouh

Thank you for this useful tip. And any ideas on what is the best way to solve the similar task, when we want to calculate not the means, but the medians for X subcategories?

2014-04-12 11:49:36

Willy Vanhaelen

ROW(\$C\$2) will always return 2 so instead of =INT((ROW()-ROW(\$C\$2))/\$E\$1)+1 you can better use =INT((ROW()-2)/\$E\$1)+1

