Written by Allen Wyatt (last updated January 1, 2024)
This tip applies to Excel 2007, 2010, 2013, 2016, 2019, and Excel in Microsoft 365
Ruby is trying to find an easy way to determine the number of atoms in molecular formulas of some chemical structures. For instance, a cell might contain a formula such as C12H10N6F2. In this case the number of atoms is 12 + 10 + 6 + 2 = 30. Ruby has about 300 of these formulas to do and was wondering if there is an Excel formula that can be used to do this.
First, the bad news: There is no easy way to do this.
There; with that out of the way, we can start to look for solutions. The example chemical formula provided by Ruby may lead some to think that counting atoms is a simple process of substituting the alphabetic characters with something else so that just the numeric characters can be evaluated. As an example, here is Ruby's example chemical formula:
C12H10N6F2
If you replace the alphabetic characters with plus signs, you get this:
+12+10+6+2
Looks like a simple formula now, right? This is deceiving, because while it will work in this instance, it may not work at all for Ruby's other chemical formulas. Consider the following chemical formula that many people will be familiar with:
H2O
Doing the same substitution renders this:
+2+
Problem is, there is an implied count of 1 whenever there is a single element—for example, the oxygen element. Thus, H20 is actually 3 atoms.
So now we can come up with a way to simply account for the implied 1, right? Sure; this can be done. It can be done most easily and cleanly with a macro, such as the following user-defined function:
Function CountAtoms(ChemForm As String) Dim sNewNum As String Dim sTemp As String Dim iNewAtoms As Integer Dim iTotalAtoms As Integer Dim J As Integer sNewNum = "" iTotalAtoms = 0 For J = 2 To Len(ChemForm) sTemp = Mid(ChemForm, J, 1) If sTemp >= "0" And sTemp <= "9" Then sNewNum = sNewNum & sTemp ElseIf sTemp <= "Z" Then iNewAtoms = Val(sNewNum) If iNewAtoms = 0 Then iNewAtoms = 1 iTotalAtoms = iTotalAtoms + iNewAtoms sNewNum = "" End If Next J iNewAtoms = Val(sNewNum) If iNewAtoms = 0 Then iNewAtoms = 1 iTotalAtoms = iTotalAtoms + iNewAtoms CountAtoms = iTotalAtoms End Function
In order to use this function in your worksheet, you would simply reference the chemical formula:
=CountAtoms(A1)
If the chemical formula is in cell A1, this function returns the count you desire. It will even work with formulas such as the following:
NaCl SbF6
Note that these rely on two-character element names, of which there are many. It does require, however, that the second character of a two-character element name not be capitalized.
So, will this approach work with all chemical formulas? Not really; it only works with the simple ones we've covered so far. You see, chemical formulas can get quite complex. Consider the following example:
2H2O
When an initial number appears like this, then the formula is to be multiplied by that value. Thus, instead of the normal 3 atoms in H2O, this formula would have 6 atoms.
It gets worse. Consider the following valid chemical formulas:
Ca3(PO4)2 Al2(SO4)3(H2O)18
Note the parentheses followed by a number. In this nomenclature, the value immediately following the closing parenthesis indicate how many of the molecules within the parentheses are in the larger molecule. Thus, in the second example there are 3 molecules of SO4 and 18 molecules of H2O in the overall molecule. This obviously affects the number of atoms in the entire formula. To compound complexity, parentheses can even be nested:
CH3(C3H4(NH2)2)18CH3
Fun, huh?
This can still be addressed with a more complex macro. Rather than reinvent the wheel here, though, if you are working with complex chemical formulas such as these, you might want to consider using the macros provided at this site:
http://www.vbaexpress.com/kb/getarticle.php?kb_id=670
Note that the macros aren't implemented as user-defined functions. To use them you simply select the cells with the formulas, run the macro, and then the macros modify information in the columns to the right of the selected chemical formulas. Full instructions are included with the code at the above website.
You'll also need to make sure you enable, in the Visual Basic Editor, regular expressions. You do this by choosing Tools | References and then scrolling through the available references to locate the Microsoft VBScript Regular Expressions 5.5 option. Make sure the check box to the left of the reference is selected, then click OK.
Note:
ExcelTips is your source for cost-effective Microsoft Excel training. This tip (13707) applies to Microsoft Excel 2007, 2010, 2013, 2016, 2019, and Excel in Microsoft 365.
Program Successfully in Excel! John Walkenbach's name is synonymous with excellence in deciphering complex technical topics. With this comprehensive guide, "Mr. Spreadsheet" shows how to maximize your Excel experience using professional spreadsheet application development tips from his own personal bookshelf. Check out Excel 2013 Power Programming with VBA today!
There are two ways to create macros: recording them or writing them from scratch. Some things cannot be done in a macro ...
Discover MoreNeed to hide some macros in your workbook? There are three ways you can do it, as covered in this discussion.
Discover MoreNeed to pull a list of words from a range of cells? This tip shows how easy you can perform the task using a macro.
Discover MoreFREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
2024-01-03 01:52:26
Tangentially to this issue I have a tip for calculating molar mass of molecules.
I have created an Excel template that has defined names for essentially all common chemical elements, but instead of assigning ranges to those names I assigned explicit values of atomic weights, with 2 or 3 decimals. This way they do not clutter any worksheet. Alternatively, you could use a hidden worksheet for that.
I could assign range names using element symbols for all elements except for C (why C is not accepted as a valid range name is a mystery for me). For this I used "C_".
Now I can easily calculate molecular weights of compounds without having to memorize or look up atomic masses, e.g.,
for CuSO4·5H2O i write =Cu+S+O*4+5*(H*2+O)
For C6H6O12 -----> =C_*6+H*6+O*12
etc.
The range names are not case sensitive so when typing you do not have to pay attention to it; I used proper capitalization for the range names though, so once entered Excel will convert to proper element symbols in the formulas.
I could find that template (I am retired now, so I do not use it often) and send it to anyone interested.
2024-01-03 01:16:31
Tomek the Mad Scientist
Chemical formulas were first invented in the beginning of 19th century, and although they changed a bit from original proposed by Berzelius, once they were agreed upon they remained mostly unchanged. What it implies is that they are meant for human interpretation, and are not easily entered into present versions of Office programs. Chemistry, a descendant of alchemistry, was always an arcane knowledge, so possibly there was no desire to make it more understandable by a common person, so formulas stayed somewhat cryptic. In particular, the formulas use subscripts to indicate the count of a preceding atom or group, brackets to group elements in to groups, superscripted numbers with superscripted +/- to indicate ion charge. Unscripted numbers before a molecule or its part indicate that whatever group/molecule follows is to be taken as multiple.
Although you can format a chemical formula in an Excel cell to display both subscripts and superscripts, it is very tedious and most people just don't bother. Such formatting could possibly allow for better parsing by some macro, but creating such a program would be still a full blown project, The link given by Allen has a function that converts all digits in the formula to subscripts, however it works properly on some formulas. The formula in the tip for aluminum sulphate octadecahydrate Al2(SO4)3(H2O)18 should actually be written as Al2(SO4)3·18H2O (see Figure 1 below) gets all digits converted to subscripts, but 18 should stay normal size. The formulas that count atoms and provide simplified formulas also do not handle formulas similar to that last one.
I think it is high time for IUPAC or similar body to consider significant revision of the way chemical formulas are written, to make it compatible with computer programs, or do we want to wait for AI to make even more mess of this?
Figure 1.
2024-01-03 01:15:04
Tomek the Mad Scientist
@Leslie Glasser:
Whether SO4 is a group or a molecule is irrelevant to the problem discussed here. One way or the other some formulas will not represent an actual molecule. There are some organometallic compounds that exist only as dimers, but still the formulas for them are rarely written to reflect this.
2019-11-30 17:41:42
Leslie Glasser
Please note: The statement "Thus, in the second example there are 3 molecules of SO4 and 18 molecules of H2O in the overall molecule." is chemically incorrect. SO4 is not a molecule, rather call it "a group of atoms". Similarly, the overall formula does not represent a molecule but rather an "empirical formula".
2019-11-30 14:54:12
Philip
Numerical items in a molecule formula sometimes don’t relate to the number of atoms but to the relative position of the bond in the molecule ... simply counting them (even taking into account all the considerations mentioned above) still won’t give a robust solution ...
Got a version of Excel that uses the ribbon interface (Excel 2007 or later)? This site is for you! If you use an earlier version of Excel, visit our ExcelTips site focusing on the menu interface.
FREE SERVICE: Get tips like this every week in ExcelTips, a free productivity newsletter. Enter your address and click "Subscribe."
Copyright © 2024 Sharon Parq Associates, Inc.
Comments