Working with Huge Datasets

by Allen Wyatt
(last updated November 25, 2017)

5

Norman's computer has a fast processor and 32 GB of RAM. For statistical purposes, he often needs to use million-line worksheets. He's still using Excel 2007, and it doesn't work well after it's run awhile with these huge workbooks. (In fact, it frequently crashes.) Other times, after startup, it works OK. Norman wonders what he can do to get more reliable operation when working with such large datasets. He wonders if Excel 365 work better, or perhaps more RAM.

Let's address the hardware issue first. Your system is probably sufficient for working with large data sets in Excel. However, if you are planning on updating your hardware, it is always a good idea to get more RAM. This allows Excel (and other programs) to do more data manipulation in memory rather than needing to swap the data between memory and the disk cache on the hard drive. Bottom line: Don't upgrade your system just for this one reason, but if you are upgrading for other reasons, then get a system with lots of RAM.

Now, on to Excel. The version of Excel you are using (Excel 2007) has a maximum memory allocation of 2 GB per instance, and do not appear to fully release all memory used be a workbook that was previously opened. This could be the cause of your "works at first, but not later" problem, and the only way to work around it is to periodically close and restart Excel.

In addition, Excel 2007 is only available in a 32-bit version. If you update to a later version of Excel, you'll be able to install a 64-bit version of the program. This is important because it allows you to address more memory, which means easier manipulation of larger data sets.

Once you upgrade your version of Excel, you'll want to follow these steps to make sure that Excel is utilizing the power of your CPU as well as it can:

  1. Display the Excel Options dialog box. (In Excel 2010 and later versions display the File tab of the ribbon and then click Options.)
  2. At the left side of the dialog box click Advanced.
  3. Scroll down until you see the Formulas section of the options. (See Figure 1.)
  4. Figure 1. The Formulas section of Excel's advanced options.

  5. Click the check box next to the Enable Multi-Threaded Calculation option.
  6. Make sure the radio button next to Use All Processors On This Computer is selected. (This option is only accessible after you complete step 4.)
  7. Click the OK button.

After taking these steps you may still run into problems with your data if the worksheets contain a lot of formulas. It makes sense that the more formulas you have—especially if those formulas involve volatile functions that must be recalculated constantly—the higher "load" you place on Excel. Anything you can do to reduce the number of formulas (and volatile functions), the more responsive you'll find Excel to be.

Now let's look at the advisability of using Excel itself. The underlying question is whether you really need to have such large worksheets in Excel in the first place. Depending on the nature of the data with which you are working, you may find it better to load your data into a database (such as Access, which is provided with many versions of Microsoft Office) and then use the database program to do your data summaries. You could, at that point, copy a much smaller subset of records into Excel to do your final analysis or formatting for presentation purposes.

Another approach once the data is in the database is to use the capabilities of Excel to access the data. For example, since you (hopefully) updated to a later version of Excel, you could use Power Query (called Get & Transform in Excel 2016) or Power Pivot to establish a connection to the database to process the data, which decreases the overall processing load on Excel.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (4809) applies to Microsoft Excel 2007, 2010, 2013, and 2016.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Moving Groups of Data Labels at One Time

Having problems working with data labels in your charts? That's not unusual; here's an idea of an add-in that could help ...

Discover More

Simple Relative References in External Links

Linked data can be essential in pulling information from one workbook to another. One downside of links, however, is that ...

Discover More

Template Changing On Its Own

When you attach a template to a document, you expect that template to stay attached. When you share the document with ...

Discover More

Create Custom Apps with VBA! Discover how to extend the capabilities of Office 2013 (Word, Excel, PowerPoint, Outlook, and Access) with VBA programming, using it for writing macros, automating Office applications, and creating custom applications. Check out Mastering VBA for Office 2013 today!

More ExcelTips (ribbon)

Displaying a Count of Zeros on the Status Bar

Excel allows you to display the results of several common worksheet functions on the status bar. The available functions ...

Discover More

Jumping to the Real Last Cell

Jumping to the last cell in a worksheet should be easy, but you may not always get the results that you expect. This tip ...

Discover More

Where Is that Name?

Want to easily see the location of named ranges in your worksheet? It's easy; all you need to do is use the familiar Zoom ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 9 + 9?

2017-11-28 10:55:06

Pete Zicari

My 2012 Macbook Pro bogs down to uselessness at about 100,000 rows, but that's because I must often use lookup formulas.
The one feature that has helped at all is to turn off automatic processing as I work with the table and then select a block of those lookup formulas for updating rather than do the whole column.
I use index(... match( for the lookups. Where I have had lookups in more than one column, I use use match() alone to get a row number and then index() in both the data columns I want. It saves ... a little ... time.
Eventually there is no option but to update the entire page, but I don't have to do it often.


2017-11-27 13:47:01

Brian

Two things I've noticed when manipulating large datasets in Excel 2010:
1. Using Find/Replace and Copy/Paste can take a long time if I'm not careful to sort the column where the operation is taking place first.
2. The screen freezes. The fix to this is to Unfreeze the worksheet (View Ribbon / Freeze Panes) and then to Freeze Panes again.


2017-11-27 07:19:53

Tom Van Dam

Before upgrading to a 64 bit version of Excel, make sure that he is using a 64 bit version of windows. In order to determine this, click on the start button on the task bar and then right click on the Computer text and choose Properties. A windows will come up and the bit version will be under the System section. This will display the process type and speed, how much ram and the System Type (32 or 64 bit).


2017-11-26 04:23:54

John Younie

Good article and good tips from Graham too.

Updating to Office 365 would give Norman access to the latest version of Excel, I download the 64bit version although the 32bit is the default. I know PowerPivot is not available in all versions of Excel but Norman really needs PowerPivot if he is going to continue using Excel for these monster datasets so the extra spend is worthwhile.

Here is a Microsoft Office blog that discusses PowerPivot and PowerQuery.

https://blogs.office.com/en-us/2015/09/18/new-ways-to-get-the-excel-business-analytics-features-you-need/?eu=true

And of course PowerPivot and PowerQuery (and a couple other Power Tools) underpin PowerBI.


2017-11-25 06:52:17

Graham

All of the suggestions above are good. However a short-term answer to ease the situation may be to stop automatic calculations during data entry, filtering, running macros, etc.
The simple macros below would help to turn calculations on and off :-

Sub CalcManual()
' Switches to Manual calculation of formulae
Application.Calculation = xlManual
ActiveWorkbook.PrecisionAsDisplayed = False
End Sub

Sub CalcAutomatic()
' Switches to Automatic calculation of formulae
Application.Calculation = xlAutomatic
ActiveWorkbook.PrecisionAsDisplayed = False
End Sub

Also any macros that are used could have the Application.Calculation lines inserted at the start and end.

Sub ExampleMacro()
Application.Calculation = xlManual

Macro commands

Application.Calculation = xlAutomatic
ActiveWorkbook.PrecisionAsDisplayed = False
End Sub


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.