Counting Atoms in a Chemical Formula

Written by Allen Wyatt (last updated January 1, 2024)
This tip applies to Excel 2007, 2010, 2013, 2016, 2019, and Excel in Microsoft 365


5

Ruby is trying to find an easy way to determine the number of atoms in molecular formulas of some chemical structures. For instance, a cell might contain a formula such as C12H10N6F2. In this case the number of atoms is 12 + 10 + 6 + 2 = 30. Ruby has about 300 of these formulas to do and was wondering if there is an Excel formula that can be used to do this.

First, the bad news: There is no easy way to do this.

There; with that out of the way, we can start to look for solutions. The example chemical formula provided by Ruby may lead some to think that counting atoms is a simple process of substituting the alphabetic characters with something else so that just the numeric characters can be evaluated. As an example, here is Ruby's example chemical formula:

C12H10N6F2

If you replace the alphabetic characters with plus signs, you get this:

+12+10+6+2

Looks like a simple formula now, right? This is deceiving, because while it will work in this instance, it may not work at all for Ruby's other chemical formulas. Consider the following chemical formula that many people will be familiar with:

H2O

Doing the same substitution renders this:

+2+

Problem is, there is an implied count of 1 whenever there is a single element—for example, the oxygen element. Thus, H20 is actually 3 atoms.

So now we can come up with a way to simply account for the implied 1, right? Sure; this can be done. It can be done most easily and cleanly with a macro, such as the following user-defined function:

Function CountAtoms(ChemForm As String)
    Dim sNewNum As String
    Dim sTemp As String
    Dim iNewAtoms As Integer
    Dim iTotalAtoms As Integer
    Dim J As Integer

    sNewNum = ""
    iTotalAtoms = 0

    For J = 2 To Len(ChemForm)
        sTemp = Mid(ChemForm, J, 1)
        If sTemp >= "0" And sTemp <= "9" Then
            sNewNum = sNewNum & sTemp
        ElseIf sTemp <= "Z" Then
            iNewAtoms = Val(sNewNum)
            If iNewAtoms = 0 Then iNewAtoms = 1
            iTotalAtoms = iTotalAtoms + iNewAtoms
            sNewNum = ""
        End If
    Next J

    iNewAtoms = Val(sNewNum)
    If iNewAtoms = 0 Then iNewAtoms = 1
    iTotalAtoms = iTotalAtoms + iNewAtoms

    CountAtoms = iTotalAtoms
End Function

In order to use this function in your worksheet, you would simply reference the chemical formula:

=CountAtoms(A1)

If the chemical formula is in cell A1, this function returns the count you desire. It will even work with formulas such as the following:

NaCl
SbF6

Note that these rely on two-character element names, of which there are many. It does require, however, that the second character of a two-character element name not be capitalized.

So, will this approach work with all chemical formulas? Not really; it only works with the simple ones we've covered so far. You see, chemical formulas can get quite complex. Consider the following example:

2H2O

When an initial number appears like this, then the formula is to be multiplied by that value. Thus, instead of the normal 3 atoms in H2O, this formula would have 6 atoms.

It gets worse. Consider the following valid chemical formulas:

Ca3(PO4)2
Al2(SO4)3(H2O)18

Note the parentheses followed by a number. In this nomenclature, the value immediately following the closing parenthesis indicate how many of the molecules within the parentheses are in the larger molecule. Thus, in the second example there are 3 molecules of SO4 and 18 molecules of H2O in the overall molecule. This obviously affects the number of atoms in the entire formula. To compound complexity, parentheses can even be nested:

CH3(C3H4(NH2)2)18CH3

Fun, huh?

This can still be addressed with a more complex macro. Rather than reinvent the wheel here, though, if you are working with complex chemical formulas such as these, you might want to consider using the macros provided at this site:

http://www.vbaexpress.com/kb/getarticle.php?kb_id=670

Note that the macros aren't implemented as user-defined functions. To use them you simply select the cells with the formulas, run the macro, and then the macros modify information in the columns to the right of the selected chemical formulas. Full instructions are included with the code at the above website.

You'll also need to make sure you enable, in the Visual Basic Editor, regular expressions. You do this by choosing Tools | References and then scrolling through the available references to locate the Microsoft VBScript Regular Expressions 5.5 option. Make sure the check box to the left of the reference is selected, then click OK.

Note:

If you would like to know how to use the macros described on this page (or on any other page on the ExcelTips sites), I've prepared a special page that includes helpful information. Click here to open that special page in a new browser tab.

ExcelTips is your source for cost-effective Microsoft Excel training. This tip (13707) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, and Excel in Microsoft 365.

Author Bio

Allen Wyatt

With more than 50 non-fiction books and numerous magazine articles to his credit, Allen Wyatt is an internationally recognized author. He is president of Sharon Parq Associates, a computer and publishing services company. ...

MORE FROM ALLEN

Remembering Copies to Print

If you routinely need to print more than one copy of a document, you'll love the ideas presented in this tip. There's ...

Discover More

Default Click and Type Paragraph Style

When you use the Click and Type feature, Word uses applies the Normal style to the paragraph created. You can specify a ...

Discover More

Hyperlinks that Open in a Different Browser Window

When you click a link in a browser, the target of that link might open in the same window or in a new window. Getting an ...

Discover More

Solve Real Business Problems Master business modeling and analysis techniques with Excel and transform data into bottom-line results. This hands-on, scenario-focused guide shows you how to use the latest Excel tools to integrate data from multiple tables. Check out Microsoft Excel 2013 Data Analysis and Business Modeling today!

More ExcelTips (ribbon)

Deleting VBA Code in a Copied Worksheet

VBA makes it easy to copy a worksheet from the current workbook into a brand-new workbook. You may want to delete some ...

Discover More

Expiration Date for Excel Programs

If you use Excel to create a macro-based application, you may want to make sure that your programs cease working after a ...

Discover More

Delimited Text-to-Columns in a Macro

The Text-to-Columns tool is an extremely powerful feature that allows you to divide data in a variety of ways. Excel even ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] (all 7 characters, in the sequence shown) in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 5 - 3?

2024-01-03 01:52:26

Tomek

Tangentially to this issue I have a tip for calculating molar mass of molecules.
I have created an Excel template that has defined names for essentially all common chemical elements, but instead of assigning ranges to those names I assigned explicit values of atomic weights, with 2 or 3 decimals. This way they do not clutter any worksheet. Alternatively, you could use a hidden worksheet for that.

I could assign range names using element symbols for all elements except for C (why C is not accepted as a valid range name is a mystery for me). For this I used "C_".
Now I can easily calculate molecular weights of compounds without having to memorize or look up atomic masses, e.g.,
for CuSO4·5H2O i write =Cu+S+O*4+5*(H*2+O)
For C6H6O12 -----> =C_*6+H*6+O*12
etc.
The range names are not case sensitive so when typing you do not have to pay attention to it; I used proper capitalization for the range names though, so once entered Excel will convert to proper element symbols in the formulas.
I could find that template (I am retired now, so I do not use it often) and send it to anyone interested.


2024-01-03 01:16:31

Tomek the Mad Scientist

Chemical formulas were first invented in the beginning of 19th century, and although they changed a bit from original proposed by Berzelius, once they were agreed upon they remained mostly unchanged. What it implies is that they are meant for human interpretation, and are not easily entered into present versions of Office programs. Chemistry, a descendant of alchemistry, was always an arcane knowledge, so possibly there was no desire to make it more understandable by a common person, so formulas stayed somewhat cryptic. In particular, the formulas use subscripts to indicate the count of a preceding atom or group, brackets to group elements in to groups, superscripted numbers with superscripted +/- to indicate ion charge. Unscripted numbers before a molecule or its part indicate that whatever group/molecule follows is to be taken as multiple.

Although you can format a chemical formula in an Excel cell to display both subscripts and superscripts, it is very tedious and most people just don't bother. Such formatting could possibly allow for better parsing by some macro, but creating such a program would be still a full blown project, The link given by Allen has a function that converts all digits in the formula to subscripts, however it works properly on some formulas. The formula in the tip for aluminum sulphate octadecahydrate Al2(SO4)3(H2O)18 should actually be written as Al2(SO4)3·18H2O (see Figure 1 below) gets all digits converted to subscripts, but 18 should stay normal size. The formulas that count atoms and provide simplified formulas also do not handle formulas similar to that last one.
I think it is high time for IUPAC or similar body to consider significant revision of the way chemical formulas are written, to make it compatible with computer programs, or do we want to wait for AI to make even more mess of this?

Figure 1. 


2024-01-03 01:15:04

Tomek the Mad Scientist

@Leslie Glasser:

Whether SO4 is a group or a molecule is irrelevant to the problem discussed here. One way or the other some formulas will not represent an actual molecule. There are some organometallic compounds that exist only as dimers, but still the formulas for them are rarely written to reflect this.


2019-11-30 17:41:42

Leslie Glasser

Please note: The statement "Thus, in the second example there are 3 molecules of SO4 and 18 molecules of H2O in the overall molecule." is chemically incorrect. SO4 is not a molecule, rather call it "a group of atoms". Similarly, the overall formula does not represent a molecule but rather an "empirical formula".


2019-11-30 14:54:12

Philip

Numerical items in a molecule formula sometimes don’t relate to the number of atoms but to the relative position of the bond in the molecule ... simply counting them (even taking into account all the considerations mentioned above) still won’t give a robust solution ...


This Site

Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.

Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.