Working with Huge Datasets

Written by Allen Wyatt (last updated November 25, 2017)
This tip applies to Excel 2007, 2010, 2013, and 2016


6

Norman's computer has a fast processor and 32 GB of RAM. For statistical purposes, he often needs to use million-line worksheets. He's still using Excel 2007, and it doesn't work well after it's run awhile with these huge workbooks. (In fact, it frequently crashes.) Other times, after startup, it works OK. Norman wonders what he can do to get more reliable operation when working with such large datasets. He wonders if Excel 365 work better, or perhaps more RAM.

Let's address the hardware issue first. Your system is probably sufficient for working with large data sets in Excel. However, if you are planning on updating your hardware, it is always a good idea to get more RAM. This allows Excel (and other programs) to do more data manipulation in memory rather than needing to swap the data between memory and the disk cache on the hard drive. Bottom line: Don't upgrade your system just for this one reason, but if you are upgrading for other reasons, then get a system with lots of RAM.

Now, on to Excel. The version of Excel you are using (Excel 2007) has a maximum memory allocation of 2 GB per instance, and do not appear to fully release all memory used be a workbook that was previously opened. This could be the cause of your "works at first, but not later" problem, and the only way to work around it is to periodically close and restart Excel.

In addition, Excel 2007 is only available in a 32-bit version. If you update to a later version of Excel, you'll be able to install a 64-bit version of the program. This is important because it allows you to address more memory, which means easier manipulation of larger data sets.

Once you upgrade your version of Excel, you'll want to follow these steps to make sure that Excel is utilizing the power of your CPU as well as it can:

  1. Display the Excel Options dialog box. (In Excel 2010 and later versions display the File tab of the ribbon and then click Options.)
  2. At the left side of the dialog box click Advanced.
  3. Scroll down until you see the Formulas section of the options. (See Figure 1.)
  4. Figure 1. The Formulas section of Excel's advanced options.

  5. Click the check box next to the Enable Multi-Threaded Calculation option.
  6. Make sure the radio button next to Use All Processors On This Computer is selected. (This option is only accessible after you complete step 4.)
  7. Click the OK button.

After taking these steps you may still run into problems with your data if the worksheets contain a lot of formulas. It makes sense that the more formulas you have—especially if those formulas involve volatile functions that must be recalculated constantly—the higher "load" you place on Excel. Anything you can do to reduce the number of formulas (and volatile functions), the more responsive you'll find Excel to be.

Now let's look at the advisability of using Excel itself. The underlying question is whether you really need to have such large worksheets in Excel in the first place. Depending on the nature of the data with which you are working, you may find it better to load your data into a database (such as Access, which is provided with many versions of Microsoft Office) and then use the database program to do your data summaries. You could, at that point, copy a much smaller subset of records into Excel to do your final analysis or formatting for presentation purposes.

Another approach once the data is in the database is to use the capabilities of Excel to access the data. For example, since you (hopefully) updated to a later version of Excel, you could use Power Query (called Get & Transform in Excel 2016) or Power Pivot to establish a connection to the database to process the data, which decreases the overall processing load on Excel.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (4809) applies to Microsoft Excel 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Formatting Subtotal Rows

Excel automatically formats subtotals for you. But what if you want to change the default to something more suitable for ...

Discover More

Quickly Dumping Array Contents

Variable arrays are used quite often in macros. If you use an array once in your macro and then need to reuse it for ...

Discover More

Listing the Settings in a Template

Templates allow you to define and collect many formatting settings that control how your documents appear. Getting a ...

Discover More

Solve Real Business Problems Master business modeling and analysis techniques with Excel and transform data into bottom-line results. This hands-on, scenario-focused guide shows you how to use the latest Excel tools to integrate data from multiple tables. Check out Microsoft Excel 2013 Data Analysis and Business Modeling today!

More ExcelTips (ribbon)

Iterating Circular References

Does your data require that you perform calculations using circular references? If so, then you'll want to be aware of ...

Discover More

Measuring Efficiency of Formulas and Macros

As the limits on what you can store in Excel have increased, so has the need to consider how to make your workbooks and ...

Discover More

Saving Movement on Enter with a Workbook

Press Enter when working in a workbook and Excel moves to a cell adjacent to the one in which you were working. If you ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] (all 7 characters, in the sequence shown) in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is two more than 4?

2019-03-14 17:32:24

Ron

2007 is no longer supported, but the question is still valid. Here are a couple of links about RAM support in newer versions of Excel.

@ RAM Support increased 32-bit Excel 2013-2016
https://www.excelguru.ca/blog/2016/05/10/32-bit-excel-memory-limit-increase/
Excel 2016 build 16.0.6868.2060
Excel 2013 build 15.0.4833.1000
Have been updated to increase RAM supported. It has been 2GB in 32-bit Win and Excel.
Now 32-bit Excel in 32-bit Win is up to 3GB and in 64-bit Win up to 4GB
Excel 2010 and earlier have not been updated.


@@ LLA - Large Address Aware capability change for Excel
https://support.microsoft.com/en-us/help/3160741/large-address-aware-capability-change-for-excel
32-bit versions of Microsoft Excel 2013 and Excel 2016 can take advantage of Large Address Aware (LAA) functionality after installation of the latest updates. (see the "Resolution" section) This change lets 32-bit installations of Excel 2016 consume double the memory when users work on a 64-bit Windows OS. The system provides this capability by increasing the user mode virtual memory from 2 gigabytes (GB) to 4 GB. This change provides 50 percent more memory (for example, from 2 GB to 3 GB) when users work on a 32-bit system.


The suggestion has already been made, so I'll second it. For large datasets like that, PowerQuery and PowerPivot are the way to go.
PowerPivot and Power View Add-Ins for Excel 2013
https://www.dummies.com/software/microsoft-office/excel/the-powerpivot-and-power-view-add-ins-for-excel-2013/
In Excel 2013, the PowerPivot add-in, introduced in Excel 2010, that enables you to efficiently work with and analyze large datasets (such as those with hundreds of thousands or even millions of records) has been made a much more integral part the program. You can easily manage massive amounts of data from many related data tables is now part and parcel of Excel 2013 in the form of its Data Model feature.
Instead of having to download the add-in from the Microsoft Office website, you can start using PowerPivot simply by activating the add-in

Use Power Query to Create Dynamic Lookup Tables from a large Data Table
https://powerpivotpro.com/2014/02/two-quick-updates-university-re-launches-monday-and-power-query-is-amazing-amazing/
The data I have is one big fat CSV exported from the LMS software. Web “data sources” are just like that. They say, “here you go, here’s your big wide CSV file. Now we can wash our hands of the whole analysis and reporting business and dump it on you.”
Making Lookup tables manually… sucks. Who wants to go through that process over and over and over? Not me.
Power Query to the rescue!

Perspectives in PowerPivot (“views”)
https://support.office.com/en-us/article/perspectives-in-power-pivot-43b4e178-3d89-486c-8acd-ebc63802fead
One of the advantages of using the Power Pivot add-in to refine a Data Model is the ability to add perspectives. Perspectives provide custom views that you define for a particular user group or business scenario, making it easier to navigate large data sets. You can include any combination of tables, columns, and measures (including KPIs) in a perspective, and you can create multiple perspectives for various reporting constituencies in your organization. Perspectives can be used as a data source for other PivotTables and reports, including Power View reports. When you connect to a workbook that includes perspectives, you can choose a particular perspective on the Select Tables and Views page of the Data Connection Wizard.



Here are some free downloads about working with large data:
9. Excel 2016 Large Data Final Report Advanced
https://www.computer-pdf.com/office/excel/589-tutorial-excel-2016-large-data-final-report.html
Download free tutorial Microsoft Office Excel 2016 Large Data Final Report, PDF book by Pandora Rose Cowart University Of Florida.

10. Excel 2016 Large Data vLookups Advanced
https://www.computer-pdf.com/office/excel/588-tutorial-excel-2016-large-data-vlookups.html
Download free tutorial Microsoft Office Excel 2016 Large Data vLookups, PDF book by Pandora Rose Cowart University Of Florida.

11. Excel 2016 Large Data PivotTables Advanced
https://www.computer-pdf.com/office/excel/587-tutorial-excel-2016-large-data-pivottables.html
Download free tutorial Microsoft Office Excel 2016 Large Data PivotTables, PDF book by Pandora Rose Cowart University Of Florida.

12. Excel 2016 Large Data Sorting and Filtering Intermediate
https://www.computer-pdf.com/office/excel/586-tutorial-excel-2016-large-data-sorting-and-filtering.html
Download free tutorial Microsoft Office Excel 2016 Large Data Sorting and Filtering, PDF book by Pandora Rose Cowart University Of Florida.



Cond- Toggle Excel Conditional Formatting On and Off
http://www.myonlinetraininghub.com/toggle-excel-conditional-formatting-on-and-off
One of the nice features of Excel Tables is the banded row formatting, which makes it easier to read and scan your data. Unfortunately Excel Tables aren’t efficient with large data sets (over 100k rows), but we can replicate the banded rows with Conditional Formatting, and we can toggle it on and off at the click of a button like this (graphic on web page is active).


2017-11-28 10:55:06

Pete Zicari

My 2012 Macbook Pro bogs down to uselessness at about 100,000 rows, but that's because I must often use lookup formulas.
The one feature that has helped at all is to turn off automatic processing as I work with the table and then select a block of those lookup formulas for updating rather than do the whole column.
I use index(... match( for the lookups. Where I have had lookups in more than one column, I use use match() alone to get a row number and then index() in both the data columns I want. It saves ... a little ... time.
Eventually there is no option but to update the entire page, but I don't have to do it often.


2017-11-27 13:47:01

Brian

Two things I've noticed when manipulating large datasets in Excel 2010:
1. Using Find/Replace and Copy/Paste can take a long time if I'm not careful to sort the column where the operation is taking place first.
2. The screen freezes. The fix to this is to Unfreeze the worksheet (View Ribbon / Freeze Panes) and then to Freeze Panes again.


2017-11-27 07:19:53

Tom Van Dam

Before upgrading to a 64 bit version of Excel, make sure that he is using a 64 bit version of windows. In order to determine this, click on the start button on the task bar and then right click on the Computer text and choose Properties. A windows will come up and the bit version will be under the System section. This will display the process type and speed, how much ram and the System Type (32 or 64 bit).


2017-11-26 04:23:54

John Younie

Good article and good tips from Graham too.

Updating to Office 365 would give Norman access to the latest version of Excel, I download the 64bit version although the 32bit is the default. I know PowerPivot is not available in all versions of Excel but Norman really needs PowerPivot if he is going to continue using Excel for these monster datasets so the extra spend is worthwhile.

Here is a Microsoft Office blog that discusses PowerPivot and PowerQuery.

https://blogs.office.com/en-us/2015/09/18/new-ways-to-get-the-excel-business-analytics-features-you-need/?eu=true

And of course PowerPivot and PowerQuery (and a couple other Power Tools) underpin PowerBI.


2017-11-25 06:52:17

Graham

All of the suggestions above are good. However a short-term answer to ease the situation may be to stop automatic calculations during data entry, filtering, running macros, etc.
The simple macros below would help to turn calculations on and off :-

Sub CalcManual()
' Switches to Manual calculation of formulae
Application.Calculation = xlManual
ActiveWorkbook.PrecisionAsDisplayed = False
End Sub

Sub CalcAutomatic()
' Switches to Automatic calculation of formulae
Application.Calculation = xlAutomatic
ActiveWorkbook.PrecisionAsDisplayed = False
End Sub

Also any macros that are used could have the Application.Calculation lines inserted at the start and end.

Sub ExampleMacro()
Application.Calculation = xlManual

Macro commands

Application.Calculation = xlAutomatic
ActiveWorkbook.PrecisionAsDisplayed = False
End Sub


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.