A first look at the FUSE spreadsheet corpus

Following on the first paper by Titus Barik et al
and some work by Mark Townsend analysing the last row and column used in each file

I downloaded the 7GB of 249,376 files and did some summary analysis of them and the VBA.

The top domain is .org (29.5%),followed by .gov (27.7%)
That’s because almost half the files are from one web site – triathlon.org. They look like files that were filled in for reporting purposes, and so contain no formulas.
Files which are simple web report downloads or automatically generated (eg quickfacts.census.gov) then they were not user-created spreadsheets at all, and so of no interest to me.
5,600 have “SpreadsheetGear” as a write access user, all from worldbank.org.

So most of the FUSE spreadsheets are of no interest to me in formula error research.
There are no .xlsm files although a simple google finds 106,000.

Of the 5037 web hosts, http://www.triathlon.org accounts for 106328 files, or 43% of the total.
http://www.triathlon.org 106328
quickfacts.census.gov 47025
theahl.com 10350
The top 3 account for 66% of the files, the top 50 (1% of the hosts) have 87% of the files.
So it’s pretty skewed towards a few domains.

The POI analysis can not handle Biff5 files, but they can be processed in Excel if you relax the File Block settings.
The top 80% of files have no formulas or very few, again because they are really data files.
12854 (5.15%) have formulas.

Only 737 had VBA code, and 472 of them had unique VBA content as determined by a MD5 hash.
They have a range of typically 10 to 2000 lines of code.
102 have “Macro recorded by…” and no Dim statements
Only 78 of 472 have Option Explicit

I have prepared a slide deck of the findings, available at:
http://www.sysmod.com/vbainfusecorpus-pobeirne.pdf (185K PDF)

Does this interest anyone?

Posted in Excel/VBA, Research | Tagged , , | 1 Comment

SoftTest Ireland Conference 2015 – Future Proof your Software Testing

SoftTest Ireland in association with the ISA Software Skillnet are hosting a one-day national conference for software test professionals.

Date: Wednesday 23rd September 2015 9am – 5.30pm
Venue: Clyde Court Hotel, Ballsbridge, Dublin 4

Cost: €100.00 per delegate
Book here: https://softtest2015.eventbrite.co.uk/

Software testing is experiencing an industry disrupting revolution, driven by rapidly evolving technologies and software delivery paradigms.
Understand the critical technologies to be learned, key skills to be developed and the professional persona required by testers to further enhance their value into the future.

Explore the future of testing with expert speakers on topics such as;
The Internet of Things
Cloud Test Ops
Test Automation
Personal professional development

9 Speakers: Paul Gerrard, Chris Ambler, Colm Harrington, Declan O’Riordan, Claire Goss, Augusto Evangelisti, Stephen Janaway, Jonathan Wright, Matt Wynn

t: @softtestireland

SoftTest Ireland

Interest group for software testers in Ireland

Posted in Software Testing | Tagged , | Leave a comment

VBA in the spreadsheets from the Enron email corpus

The European Spreadsheet Risk 2015 Conference papers are now available at

My presentation was on investigating the use of VBA in spreadsheets in the Enron email corpus.


Most of the slides deal with the mechanics of how I did it, and statistics on the 538 workbooks found with unique VBA content. The network graph I did for interest was #madewithgephi.

Workbook VBA code similarity

Workbook VBA code similarity

Some conclusions I came to are:

1) The workbooks are probably not typical of the routine mass of everyday spreadsheets.
If people email spreadsheets to others, I infer that they don’t have a shared folder on the network. Therefore these workbooks are for communication. So, they are probably not an insight into the real ‘dark matter’ of EUC that stays in shared folders and never emailed – eg routine accounting workbook.

2) Apart from simple static analysis which gives a general indicator of code quality, it is very difficult to say if the VBA contains errors.
The real test is in execution of the code. But we cannot reproduce the environment in which these workbooks were created. The files we have are probably circulated as reports for reading, so after the execution of the code. Therefore do not have the preconditions to run the code. Static analysis like TM-VBA inspector only goes so far.
Code inspection can raise questions of unsafe practices and assumptions, but is time consuming.

Posted in Excel/VBA, Research | Tagged , , , , | Leave a comment

XLTest 1.55 released

I have updated my spreadsheet auditing addin.
Changes in version 1.55
1.    On startup, it checks and fixes autonumber seed problems in database
2.    Processing limit of 1 million cells made an option
3.    ExportActiveProject procedure runnable by VBA automation
4.    VBA Sort Procedures option
5.    Option to list all sheet statistics separately when scanning files
6.    Demo / 30-day evaluation version made available
7.    Utility option to display current autofilter criteria
8.    Numerous usability tweaks


In order to make a time limited trial available I am now protecting it using the ‘Unviewable VBA’ app from Petros Chatzipantazis:


He also markets a Ribbon Commander addin intended to make Ribbon customisation easier than XML.

Posted in Uncategorized | Tagged | Leave a comment

Excel VBA Copy Range, Paste as Table to Powerpoint 2010/2013

So, I’ve spent the last couple of weeks trying to get that one right.

Piecing together all the hints from stackoverflow and the rest, I find that I have to use the ExecuteMso method available in PP2010 and later, and use Doevents to give PP time to act in order to avoid raising errors.

In this code, mPPAppObject is the Powerpoint Application object accessed from Excel VBA eg Set mPPAppObject = GetObject(, "PowerPoint.Application")

Assuming we start with rng.Copy then if the paste is to be a table then

for i=1 to 500:Doevents :next ' 5 hundred, about 10ms timing

‘Otherwise get a Clipboard Error -2147188160 Shapes (unknown member) : Invalid request.  Clipboard is empty or contains data which may not be pasted here.

mPPAppObject.CommandBars.ExecuteMso "PasteExcelTableSourceFormatting" ' 2010+


for i=1 to 5000:Doevents :next ' 5 thousand, about 100ms
‘Otherwise calling code testing the slide.shapes collection does not see the pasted shape yet, or it does not see that the shape has a table (oShape.HasTable)

Set oShape = mPPSlide.Shapes(mPPSlide.Shapes.Count)

If we want some other paste format, then it’s a bit easier:

Set oShape = mPPSlide.Shapes.PasteSpecial(PasteDataType)

‘Where PasteDataType can be

‘ 0 ppPasteDefault ‘ truncates to correspond to visible on worksheet window
‘ 2 ppPasteEnhancedMetafile pastes correctly with full width
‘ 3 ppPasteMetafilePicture ‘ same as default, *truncated if too wide*

‘ 7 ppPasteText
‘ 8 ppPasteHTML ‘ does not work, use mPPAppObject.ActiveWindow.View.Paste

In PP 2013 I sometimes found that more than one shape would be pasted, so:

' oShape.Count property does not exist for correctly pasted shape. So use errtrapped function
If ObjectCount(oShape) > 1 Then ' handle pp 2013 bug
'Debug.Print ">1 shape pasted: "; TypeName(oShape); oShape.Count ' ShapeRange 2
Set oShape = oShape(1)
End If

Function ObjectCount(obj As Object) As Long
If obj Is Nothing Then
ObjectCount = 0
ObjectCount = 1
On Error Resume Next
ObjectCount = obj.Count
'could do Err.Clear here if you like
End If
End Function

What a kludge to get something done that should be easily accessible from the COM model.

Posted in Uncategorized | Tagged , , , , | 18 Comments

ADODB Connection string for LocalDB and SQL Server Native Client

The limit on Access MDB file size is 2GB. To get around that I wanted to try out the lightweight LocalDB server, rather than SQLExpress.
SqlLocalDB.msi is 38MB
sqlncli.msi is 3MB for win32, 5MB for win64

It took a while to get the right connection string, so to save others the wasted time, here’s what I found:

SQL Server 2014 Express LocalDB
says under “Connecting to the Automatic Instance”:

The easiest way to use LocalDB is to connect to the automatic instance owned by the current user by using the connection string “Server=(localdb)\MSSQLLocalDB;Integrated Security=true”. To connect to a specific database by using the file name, connect using a connection string similar to “Server=(LocalDB)\MSSQLLocalDB; Integrated Security=true ;AttachDbFileName=D:\Data\MyDB1.mdf”.
The first time a user on a computer tries to connect to LocalDB, the automatic instance must be both created and started. The extra time for the instance to be created can cause the connection attempt to fail with a timeout message. When this happens, wait a few seconds to let the creation process complete, and then connect again.

The string uses “Integrated Security=true;”, but as you can see below that failed for me.

You will see other results in google using
and you can check the version at a command prompt;
C:\> sqllocaldb v
Microsoft SQL Server 2014 (12.0.2000.8)

but MS have now gone to a version independent name
This is not yet mentioned on

As I am connecting from Excel using ADODB, my connection string will also need a Provider, which is not mentioned in the MSDN article above.

I use the SQL Server Native Client :
you can download SQLNCLI11 directly from

I initially used “Provider=SQLNCLI;…” but got an error 3706
Provider cannot be found. It may not be properly installed.
The windows script I posted earlier gives a list of what providers are already installed on your PC.
so I find I need to use the version number in this case:
If you want to check if a specific ADO provider is installed, check it in registry at path HKEY_CLASSES_ROOT\[Provider_Name] which gives you its CLSID.

Another potential source of confusion is the use of equivalent names in the connection string
eg Trusted_Connection=true is ODBC and Integrated Security=SSPI is OLEDB.

gives a handy table showing that for example, these are the same:

    | Value                | Synonym                 |
    | extended properties  | attachdbfilename        |
    | timeout              | connect timeout         |
    | server               | data source             |
    | database             | initial catalog         |
    | trusted_connection   | integrated security     |

Error messages

Here I used the wrong version, 11 instead of 12:
“Provider=SQLNCLI11;Server=(localdb)\v11.0;Integrated Security=SSPI;”
-2147467259 SQL Server Network Interfaces: Cannot create an automatic instance.

This is correct; if you get a timeout error, simply retry:
“Provider=SQLNCLI11;Server=(localdb)\v12.0;Integrated Security=SSPI;”
-2147467259 Unable to complete login process due to delay in opening server connection
I got a timeout error on my PC after 20 seconds so now add “timeout=30;” to the string
I imagine that time might improve with Windows prefetch.

This is the version-independent string:
“Provider=SQLNCLI11;Server=(localdb)\MSSQLLocalDB;Integrated Security=SSPI;”

The MS article uses “Integrated Security=true;”, but that failed for me:
“Provider=SQLNCLI11;Server=(localdb)\MSSQLLocalDB;Integrated Security=true;”
-2147217887 Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.

confirms that it should be “Trusted_Connection=yes;”

This is what I finally used:
I am posting this here to provide a google hit for that string, in the hope this helps someone.


Posted in Excel/VBA | Tagged , , , , , , , | Leave a comment

VBScript to list installed OLEDB Providers

I found that the script I got by googling for that title failed on the line:
Error: ActiveX component can’t create object: ‘RegEdit.Server’

So here’s my own VBScript :

'List of installed OLEDB providers on local computer
Option Explicit
Const HKEY_CLASSES_ROOT = &H80000000

Dim OutText, Key, strComputer, objRegistry, arrKeys
Dim strKeyPath, strValueName, strValue, uValue

strComputer = "."

Set objRegistry = GetObject("winmgmts:\\" & strComputer & "\root\default:StdRegProv")

objRegistry.enumKey HKEY_CLASSES_ROOT, "CLSID", arrKeys
Wscript.Echo "Number of Keys to be searched: " & 1+ UBound(arrKeys) & vbCrLf & "Click OK and wait a minute" ' 7645

for each key in arrKeys
strKeyPath = "CLSID\" & key
strValueName = "OLEDB_SERVICES"
if objRegistry.GetDWordValue (HKEY_CLASSES_ROOT,strKeyPath,strValueName,uValue)=0 then ' that value name exists
'get the (Default) value which is the name of the provider
objRegistry.GetStringValue HKEY_CLASSES_ROOT,strKeyPath,"",strValue
OutText=OutText & strValue & vbcrlf
' and the expanded description
objRegistry.GetStringValue HKEY_CLASSES_ROOT,strKeyPath & "\OLE DB Provider","",strValue
OutText=OutText & " " & strValue & vbcrlf
end if
Wscript.Echo OutText

'Windows Script Host
' Microsoft OLE DB Provider for SQL Server
' Microsoft Office 12.0 Access Database Engine OLE DB Provider
' Microsoft Office 15.0 Access Database Engine OLE DB Provider
' SQL Server Native Client 10.0

Posted in Productivity | Tagged , , , , , | 2 Comments