Excel questions at the Financial Modelling Summit 2022

Slugify text

We want to slugify text, ie include only characters a-z, 0-9 and “-“, so that it can become part of an URL. So that for example

“Why I say ‘Bazinga!” – my #1 Reason for Using =INDEX(MATCH())!!”

Becomes

“why-i-say-bazinga-my-1-reason-for-using-index-match”

Bill Jelen’s MrExcel site has a LAMBDA function which achieves the effect through recursion:

=LAMBDA(reference, ndx,
  IF(ndx > LEN(reference),
    SUBSTITUTE(reference, “–“, “-“),
    SLUGIFY(
      LET(
        character, LOWER(MID(reference, ndx, 1)),
        charcode, CODE(character),
        LEFT(reference, ndx – 1) & IF(OR(AND(charcode > 96, charcode < 123), AND(charcode > 47, charcode < 58)), character, “-“) & RIGHT(reference, LEN(reference) – ndx)
      ),
      ndx + 1
    )
  )
)
https://www.mrexcel.com/board/threads/slugify.1159292/

I didn’t like the idea of using recursion where simple looping through the string would do, so I wrote this Slugify(text) user-defined function using a LAMBDA. It converts the text to lowercase and then to an array, iterates through all the characters in the array passing them to another LAMBDA function AlphaNum(char) to return a space if the character is not a-z or 0-9, and finally concatenates the array back to a string.

Slugify =LAMBDA(text, SUBSTITUTE( TRIM(  CONCAT( AlphaNum( MID(LOWER(text),SEQUENCE(LEN(text)),1))))," ","-")) 

AlphaNum =LAMBDA(char,IF((char>="a")*(char<="z")+(char>="0")*(char<="9"),char," "))

We cannot define AlphaNum as

=LAMBDA(char,IF( OR( AND(char>=”a”, char<=”z”), AND(char>=”0″, char<=”9″)),char,” “))

Because AND() and OR() return single values and cannot be used in an array function. So I use Boolean * for AND and + for OR.

Bill Jelen showed a neat trick to reduce multiple occurrences of a character: change all occurrences of that character to a space, then use TRIM to reduce all spaces to one, then substitute back. Here I use spaces to begin with, then substitute them with a dash at the end.

Of course, the only use for LAMBDA is to create a nice encapsulated user-defined function. If you want a non-LAMBDA formula on a direct cell reference eg A3, just use a simple LET:

=SUBSTITUTE(TRIM(CONCAT( LET(ch,MID(LOWER(A3),SEQUENCE(LEN(A3)),1), 
IF((ch>="0")*(ch<="9")+(ch>="a")*(ch<="z"),ch," "))
))," ","-")

IFS() functions only work with ranges, not arrays

A participant asked:

Why does this work: =LET(A,A1:A10,B,A+1,COUNTIFS(A,”<5″)); it returns a count.

But this does not: =LET(A,A1:A10,B,A+1,COUNTIFS(B,”<5″)); it returns #VALUE!.

The reason is that all the IFS functions only accept a range as their first parameter, not an array. So for example =COUNTIFS({1,2,3,4,5,6},”<5”) gives a syntax error, an invalid formula.

The solution is to use FILTER() to select the values you want, then use COUNT or AVERAGE or whatever you want on that.

=LET(A,A1:A10,B,A+1,AVERAGE(FILTER(B,B<5)))

Patrick O’Beirne, 7 October 2022

Posted in Uncategorized | Tagged , | Leave a comment

Excel for Mac, VBA OnKey macros, and MacOS Monterey

You know that feeling when a manufacturer pulls the rug from under you? When a feature (OK, a workaround, even a hack) you’ve been relying on for years suddenly stops working because of an “upgrade”? Welcome to the world of the VBA developer in an Apple environment.

Excel VBA macros can be launched by keystroke shortcuts such as Ctrl+q, Ctrl+Shift+q, Ctrl+Alt+q. For example, Application.OnKey “^%q”, “MyMacro” is launched by Ctrl+Alt+q.

That works fine in Windows. But Excel for Apple Mac VBA does not detect the Command or Option (same as Alt) keys. There used to be a workaround, posted by Ron de Bruin on macexcel.com which used the Cocoa library in Python to return the modifier keys pressed. My version is

 ScriptToRun = "do shell script ""/usr/bin/python -c 'import Cocoa; print Cocoa.NSEvent.modifierFlags()' "" "
 
 flags = MacScript( ScriptToRun  ) 

'https://eastmanreference.com/complete-list-of-applescript-key-codes
'Key      bit value
'CapsLock 16 65536
'Shift    17 131072
'Ctrl     18 262144
'Option/Alt 19 524288
'Command   20 1048576

BUT in the latest MacOS upgrade, Monterey, Python is no longer built in. Apple has officially deprecated Python 2.7 in macOS Monterey 12.3. “The company is advising its developers to use an alternative programming language instead, such as Python 3, which, however, does not come preinstalled on macOS.”

I posted a question “Is there a newer/better way to detect what modifier keys are being pressed?” at https://macscripter.net/viewtopic.php?pid=209786 and got a very helpful reply from Nigel Garvey.

Here’s the MacScript version. Multiple lines are separated by both return & linefeed characters.

scpt = "use framework ""AppKit""" & vbCrLf
scpt = scpt & "return current application's class ""NSEvent""'s modifierFlags()"
Debug.Print MacScript(scpt)

However, in the future we may have to use the more awkward AppleScript function. The MacScript function has been deprecated, therefore it is no longer supported (although at the time of writing it is still present in Office for Mac VBA). For more information, see this Stack Overflow article.

The script needs to be put in a .scpt file in a hidden folder in the user’s home folder: /Users/username/Library/Application Scripts/com.microsoft.Excel. The file looks like the following; from testing I found that of the “use” statements, only the AppKit line is essential.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AppKit"
use scripting additions

on modifierKeysPressed()
   return current application's class "NSEvent"'s modifierFlags()
end modifierKeysPressed

It is invoked by the AppleScriptTask function, eg AppleScriptTask("modifierKeysPressed.scpt", "modifierKeysPressed", ""). This is unfortunately quite slow, from 0.7 seconds to several seconds, I found when testing.

Posted in Uncategorized | Leave a comment

WordXLe: Wordle in Excel

If you play Wordle daily, or the French version LeMot, you might want to practice more often. For fun, I created an Excel version that you can download, WordXLe, and a Wordle Solver. It has sheets for both English and French versions. It requires an Excel 365 subscription because it uses the dynamic array functions. If you don’t have Excel 365, you can still use the Google sheets version of the Wordle Helper. Be sure to Save a Copy so you can edit the copy.

The workbooks were last updated 27 April, 2022. Be sure you have the latest version! UPDATE 9 Feb 2022: BYROW function and LAMBDA make the formulas more manageable.

UPDATE 3 May 2022: Wordle Helper 4.0 released. (Downloads: http website, insecure or Dropbox download, secure.) This requires the latest Excel 365 version because it uses the BYROW and LAMBDA functions. It uses the same grid layout as Wordle so should be more familiar. You set the colour of a cell by clicking its spinner. The colour scheme is configurable. Let me know if you find it better!

Watch the video of my talk on this topic to the London Excel Meetup.

The dictionaries are:
English main (for validation) c.12000 words from the SOWPODS dictionary; for play c.1100 words.
Version française: principal c.8000 mots; https://github.com/hbenbel/French-Dictionary/blob/master/dictionary/dictionary.txt

Le jeu 1700 mots. https://www.freelang.com/dictionnaire/dic-francais.php
I removed all accents to simplify the game.

It uses Conditional Formatting for colouring, Data Validation to enforce some letter entry rules, no VBA macros, just formulas. The sheets are protected, but it’s easy to unhide things in Excel if you really want to so I’ll leave that as a challenge. 

CAUTION: Always ensure that this is the first workbook you open after starting Excel, and that you close without saving and exit Excel when finished the game. This workbook uses iteration to randomly select a hidden word. If any other workbook is open before this is opened, that will probably prevent this from working. If Excel gives a warning about circular references when you complete the first word, click File > Options > Formulas and check “Enable iterative calculation”. When finished, close this workbook without saving it, and close Excel, to avoid the iterative calculation setting of this workbook affecting any subsequent ones you open.

How to play: Enter one letter per cell, one valid 5-letter word per row. This version recognises about 12000 words but only 1100 are in the selection for playing. The cells change colour, following the Wordle scheme for colour-blind vision: Orange: correct letter in the correct place. Blue: a letter in the word but not in the correct place.  Grey: incorrect letter. Do not change a word once entered. Work down the rows until you get them all correct. If you want some help, click the “Show Hints” checkbox to see a list of words (max 100) that fit your letters so far. If you cannot do it on the sixth row,  the answer will appear in the right hand column.

Wordle Helper

By popular demand I give a Wordle helper workbook for Excel 365 with sheets for both English and French. Right-click the link and download it. Or, use my Google Sheets Wordle Helper; save a copy so you can edit your own copy.

On the language sheet you want, enter the letters you know so far and it will show you the words that fit that selection. Handy for when you have only one guess left! Here is how it works:

Think of this as a text searching problem that could be used in other scenarios. You might have part codes or account codes where there may have been transcription errors and you want to find out what the entry should have been.

Excel Wordle Helper

We start with a list of known words. Wordle has a list of 12970 words. I use a list of12478.
In the game, we know three things about the letters in each of your guess words:
1) Those in the correct place in the unknown word; they are on green tiles.
2) Those present in the unknown word but not in the correct place; they are on yellow.
3) Those not in the unknown word; they are on grey.

So the Helper has to match the words in the complete word list with the letters know to be in or not in the word.
In Excel 365, we have the FILTER() function which dynamically filters a list. It is not in Excel 2019, 2016 or older.
https://support.microsoft.com/en-us/office/filter-function-f4f7cb66-82eb-4767-8f7c-4877ad80c759
=FILTER(range, criteria, ifempty)
The range is our list of words, for example in W1:W12478.
The criteria expression tests each word and returns a True to include it in the filtered list, False to exclude it.
So we could show all words beginning with “A” like this:
=FILTER(W1:W12478, LEFT(W1:W12478,1)="A","")
The final “” parameter is for the function to return an empty string rather than #CALC!

From now on, I’ll use the range name “en” to refer to our English dictionary list.

Let’s set up five cells and place the known correct letters in each of the five cells.
Suppose we have five cells A2:E2 as shown in the picture above.
Our test for the exact correct first letters then is
=FILTER(en,LEFT(en,1)=A2,””)


Now to test two letters. We want the filter to include only words where both letters are in the right place.
You might think we could use AND( LEFT(A1,1)=A2 , MID(en,2,1)= B2)
But unfortunately AND does not work with arrays. =AND({TRUE,TRUE},{TRUE,FALSE}) returns only one result, FALSE, not the expected {TRUE, FALSE}
The way to combine them with AND is to multiply the two tests. Multiplication returns a numeric result where 1=True, 0=False.
={TRUE,TRUE}*{TRUE,FALSE} returns {1,0}
So our formula now is (I’ve changed the LEFT to MID for consistency)
=FILTER(en, (MID(en,1,1)=A2) * (MID(en,2,1)=B2) ,””)


However, if either A2 or B2 is blank, that will fail, as a letter does not match a blank.
So we extend the test by allowing any cell to be empty.

The SEARCH function does a case-insensitive search of one string inside another.
=SEARCH(“a”,”start”) returns 3
If you search for an empty string, it always returns 1.
=SEARCH(“”,”start”) returns 1
If you search for a string NOT in the searched string, it returns an error value.
=SEARCH(“x”,”start”) returns #VALUE!
Therefore to get a simple True/False, we use ISNUMBER() around the search so any numeric result of the search is True and any error is False. We multiply all the search results together as they must all be true for this first set. We only need one ISNUMBER because search with an empty cell always returns 1. To make such a long formula easier to read, press Alt+Enter after each subexpression so it finally reads

=FILTER(en,ISNUMBER(    
SEARCH(A2,MID(en,1,1))  
* SEARCH(B2,MID(en,2,1))  
* SEARCH(C2,MID(en,3,1))  
* SEARCH(D2,MID(en,4,1))  
* SEARCH(E2,MID(en,5,1))  ),"") 

It would be nice to make a subarray search of that, but you need the BYROW or BYCOL functions only released 9 Feb 2022. And here it is!

=FILTER(en,BYROW( 
 ISNUMBER(SEARCH(A2:E2,MID(en,SEQUENCE(1,5),1))),
 LAMBDA(array,PRODUCT(--array))))

Secondly we want to filter words containing letters that we know are in the target word, although not necessarily in the position of our guess.  We put all these letters into one cell, G2. There cannot be more than five of these letters. The following formula returns all the words containing the first letter of G2.

 =FILTER(en,ISNUMBER(SEARCH(MID(G2,1,1),en))) 

We want to apply this filter to the output of the previous filter. Rather than make the first formula even longer, we’ll create a new formula and its input range is the output of the first filter. If the first filter is in I3, then its spilled dynamic range is referred to as I3#.

=FILTER(I3#,ISNUMBER(SEARCH(MID(G2,1,1),I3#))) 

Now to extend the formula to cover up to five letters in G2. We put this in J3:

=FILTER(I3#,ISNUMBER(
 SEARCH(MID(G2,1,1),I3#)
*SEARCH(MID(G2,2,1),I3#)
*SEARCH(MID(G2,3,1),I3#)
*SEARCH(MID(G2,4,1),I3#)
*SEARCH(MID(G2,5,1),I3#)),"")

With BYROW() based on J3# above :
=FILTER(I3#,BYROW(ISNUMBER( 
 SEARCH(MID(G2, SEQUENCE(1,5),1),I3#)),
 LAMBDA(array,PRODUCT(--array))))

Finally we consider the letters which are NOT in the hidden word. There could be at most 21 of these, but let’s take a sensible limit at 15. We can put them all into one cell, say H2.
We want to exclude any word containing any of these letters. In other words, we include any word where the search for the letter returns an error. we use ISERROR instead of ISNUMBER.


A complication is that ISERROR(SEARCH(MID(H2,1,1),J3#)) will return False if H2 is empty, because SEARCH will return 1.
So we ensure the test string is padded to at least fifteen characters as follows:
In K2, put =H2&REPT(“@”,15) and now use K2.
Put this in L2, using the filter result from J3.

=FILTER(J3#,
 ISERROR(SEARCH(MID(K2,1,1),J3#))
*ISERROR(SEARCH(MID(K2,2,1),J3#))
... *ISERROR(SEARCH(MID(K2,15,1),J3#)),"")

Version with BYROW() :
=FILTER(J3#, BYROW(
 ISERROR(SEARCH(MID(K2, SEQUENCE(1,15),1),J3#)),
 LAMBDA(array,PRODUCT(--array))
))

Now we have the complete filter result.

See the workbook “WordleHelper.xlsx“. The intermediate columns I:K are hidden for tidiness but the sheets are unprotected. The BYROW version is not uploaded yet, until BYROW is generally available to all Excel 365 users. You have the formulas above.

Here is Joe McDaid’s tweet on the release of 9-Feb-2022:


Today we rolled out lambda function support to Excel users on the current channel. If you use dynamic arrays, I’d recommend starting with BYROW and BYCOL which can be used to aggregate rows/columns and spill the results.
Announcing LAMBDAs to Production and Advanced Formula Environment, A Microsoft Garage Project

https://techcommunity.microsoft.com/t5/excel-blog/announcing-lambdas-to-production-and-advanced-formula/ba-p/3073293


Let me know how you like it!

Posted in Excel/VBA | Tagged , | Leave a comment

XLTest 1.74 spreadsheet audit addin

I have released an update to my XLTest Excel add-in for spreadsheet auditing, visualisation, and testing.

Changes up to version 1.74 (September 2021)

  1. Start Session dialog changed to show internal workbook information before opening, and to open in manual calculation to prevent recalculation.
  2. Start can remove sheet protection from unprotected OpenXML files to simplify processing.
  3. Reports legacy BIFF properties in xls files to sheet $FileDoc.
  4. Handles Unicode file and folder names.
  5. Warns on unusual range name reference starting with “=!”.
  6. New #Error values (#SPILL! etc) are reported in the detailed inspection.
  7. Right-click shortcut to filter where-used listings for Names, Styles, Links.
  8. Performance improvements.
  9. VBA performance timing at procedure and optionally line level.
Detailed documentation reveals all the non-obvious content of the spreadsheet. It can reveal hidden rows, columns, and sheets. It helps you get to grips with a large spreadsheet that you have to understand.
Colour maps give you an easily readable visualization of the structure and content of a spreadsheet. Inconsistent formulas and data stand out for attention.
Detailed error checking makes it easy to find and fix errors far more quickly than with tedious cell-by-cell inspection.  
Test case maintenance and documentation make it easy to prove regression testing
Utilities provide more ways to test, such as profiling VBA performance.
Worksheet and workbook comparisons make it easy to see what has changed between two versions of a spreadsheet file
Convenient keyboard and menu shortcuts make navigation and operations easier
Quickly scan folders of Excel files to create an inventory of spreadsheets and their statistics to assist in risk assessment

For a complete manual, pricing, and evaluation version, go to xltest.com

Posted in Uncategorized | Tagged , , , , | Leave a comment

Get the real used range of a worksheet

Update 1-Dec-2021: Timing tests show that a binary search method is the fastest.

VBA programmers use many different ways to find the bottom right hand corner of a sheet’s data, the range which encompasses all the data cells but not empty formatted cells or cells which were previously used but are now empty.
Paul Kelly’s good introductory video describes well the problems with four common methods:
.CurrentRegion, .UsedRange, .SpecialCells(xlCellTypeLastCells), and .End(xlUp)
The method most experienced people advocate is to use the .Find method for any data (“*”), starting at A1, and searching backwards, so that the search begins at the bottom of the sheet.
However, that will return only the first cell if the data found is in a merged range.
And it will only search within visible rows, and ignore rows filtered out, which may be the last rows of the sheet.

Another way, posted by Tim Jeffryes on the Excel-L mail list, is to use a formula to find the maximum row and column that contain data. That works well for reasonably sized sheets, but if the used range exceeds 45 million cells (that may depend on your version of Excel), which may happen if people have formatted entire rows or columns, the formula returns #VALUE!.

I provide a sample workbook to test all these scenarios, with a GetLastCell() function. It tries the formula method first, and if that returns an error value, it uses a binary search method of the usedrange, to find the cell at the intersection of the last used row and column. In practice, you may as well use BSearchLastCell() all the time, it’s faster than the formula method.

For a sample data sheet with a data range A1:K1001, the .Find method takes 13ms, the formula methods 30ms, and the binary search only 1.7ms. For a test sheet formatted to XFD1048576 and a data range to K1001, Find takes 103ms, the formulas fail, and the binary search only 3.8ms.

I also provide a sub ShapesLastCell() that reports the bottom right shape. If you want to delete apparently excessive rows or columns, you may want to not delete shapes, or move them to a better position.

The main sub testAll() runs all tests on all sheets. It shows :
1) The Range.Find and MAX(SUBTOTAL(… methods omit the rows hidden by filtering.
2) The ISBLANK and MMULT formulas search hidden rows but are limited to a usedrange of c.45M cells.
3) Shapes can be placed outside the used range.
4) The best method is to use the BSearchLastCell function.
5) Any data cell found needs to be adjusted for any merged range that it is part of.

Let me know what you think – is this useful for you?

Posted in Excel/VBA | Leave a comment

Power Query infer data types with Table.Profile

The Power Query UI automatically detects column data types and performs the transformations for you. Typically these steps look like this:

let
    Source = Csv.Document(File.Contents("online_retail_II.csv"),[Delimiter=",", Columns=8, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    online_retail = Table.PromoteHeaders(Source, [PromoteAllScalars=true])
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Invoice", type text}, ...etc...}})
in
    #"Changed Type"

However, when I am calling Power Query M code from VBA ( see my previous posts) to open any given text file I usually do not know at that time what the column names or data types are. So after PromoteHeaders I have to infer the column data types. There is no built-in PQ function to do that. The following code for a DetectListValuesType() function adapted from Imke Feldmann the BIccountant and Steve de Peijper will infer the data type from a list of values, with a given margin of error allowed. If I subsequently transform the column, any values which are not valid for the majority type, such as text mixed in a column of largely numeric data, would be converted to Error values.

DetectListValuesType =
// Return the majority data type in a list with the given proportion of errors allowed.
(values as list, marginforerror as number) as text =>
    let
        maxerrors = List.Count(values) * marginforerror, // 0 <= marginforerror < 1

        distinctvalues   = List.Distinct(values),

        NullValuesOrEmptyStringsOnly =
            List.NonNullCount(distinctvalues) = 0
            or
            (List.NonNullCount(distinctvalues) = 1 and distinctvalues{0} = ""),

        CheckTypes = {"number", "date", "datetime", "text"},

        Env = Record.Combine({[L=values],
                            [NF = #shared[Number.From]],    [DF = #shared[Date.From]],   
                            [DTF = #shared[DateTime.From]], [TF = #shared[Text.From]],
                            [LT = #shared[List.Transform]], [LS = #shared[List.Select]], [LC  = #shared[List.Count]]
                            }),

        NumberOfErrors = List.Transform( {"NF", "DF", "DTF", "TF"},
                                            each Expression.Evaluate("LC(LS(LT(L, each try  " & _ & "(_) otherwise ""Error""), each _ = ""Error""))", Env)
                                        ),

        // Type conversion of null will return null, and not throw an error, so null values do not contribute to the conversion error count.
        // If all list values are equal to null, no errors will be detected for any type. type text will be selected in this case
        // If list values are simple values (not complex typed values like e.g. {"a","b"}), Text.From() will not throw an error,
        // and the number of errors for conversion to text will be 0; i.e. text conversion will meet the zero error criterion
        // If values contain dates only (none of the values is with time),
        // for both date and datetime conversion the number of errors will be 0 and date will be selected (date has a lower index) as conversion type.
        // If at least 1 of the date values include a valid time part, date conversion will throw an error, and datetime conversion will not throw an error;
        // hence datetime conversion will be selected.

        CheckErrorCriterion = List.Transform(NumberOfErrors, each _ <= maxerrors),

        typeindex = List.PositionOf(CheckErrorCriterion, true),  // First type which meets the error criterion;
                                                                 // if none of the types meets the error criterion, typeindex = -1

        typestring = if NullValuesOrEmptyStringsOnly or typeindex = -1 then "text" else CheckTypes{typeindex}
    in typestring

To create a schema table with the column names and data types of all the columns in a source table, I can use the Table.Profile function with the undocumented parameter optional additionalAggregates as nullable list. The details are given in Lars Schreiber’s post on additionalAggregates. In brief, it is a list of lists, like this: {{}, {}}. Each of these inner lists represents a new calculated profile column and consists of three items:
Output column name as text, typecheck function as function, aggregation function as function

My ProfileAggregates() function code then is as follows:

// return a profiling table with the inferred data type of each column
//Column	Min	Max	Average	StandardDeviation	Count	NullCount	DistinctCount	ColumnType
(SourceTable as table, numberofrecords as number, marginforerror as number) => 
    let
        numberofrecords = if numberofrecords<=0 then Table.RowCount(SourceTable) else numberofrecords,

        tblProfile = Table.Profile( SourceTable, {{"ColumnType", each true, 
            each DetectListValuesType( List.FirstN( _, numberofrecords ), marginforerror) }})

    in
        tblProfile

This function can then be called like this:

ProfileAggregates(online_retail, 10000, .01)

I tried it on the “Online Retail II” data set which has 1,067,371 rows. That’s too many for Excel, but a good test. Excel takes 25 minutes to run the complete analysis which indicates that some performance tuning should be investigated. Here is some of the output; I reordered the columns to put ColumnType after the column name.

Output of Table.Profile with additional ColumnType column

If you know how to create the table of column names, types, and other data, directly rather than using Table.Profile, let me know at the PowerBI Community forum, or in the comments here.

Posted in Uncategorized | Tagged , , | Leave a comment

Excel Virtually Global 2021

I am one of the 48 speakers at a 48-hour Excel extravaganza online on Oct 12 and 13. It’s FREE but you are encouraged to make a donation to charity to reflect the value to you of the content you benefit from. Visit ExcelVirtuallyGlobal.com.

The agenda:

https://excelvirtuallyglobal.com/program/

“Microsoft’s Most Valuable Professionals, or MVPs, are technology experts who passionately share their knowledge with the community.  They are always on the “bleeding edge” and have an unstoppable urge to get their hands on new, exciting technologies. This virtual conference, now in its fifth year, presents Excel, Data Platform and PowerPoint MVPs, together with other acknowledged experts from Microsoft and / or around the globe to present, answer questions and demonstrate the future of Excel and how it will make your life easier, personally and professionally.
Topics include: Auditing Spreadsheets, Charts, Dashboards, Data Analysis, Data Types, Dynamic Arrays, Excel Tricks & Tips, Financial Modelling, Forecasting, Maps, Microsoft 365, New Functions and Features, PivotTables, Power BI, Power Pivot, Power Query, Presenting Your Data, Spilled Formulae, Testing, Timelines and developing solutions with DNA .Net, VBA, Python, R, OfficeScript and Office-JS.  And note well – it’s not all in English, with sessions in Mandarin, Portuguese, Spanish and Telugu, as well.

Most sessions will be recorded so you may watch them later with downloads aplenty – there are no medals for staying up to watch the entire event live!

From your own favourite chair, bring a laptop, an inquisitive mind and your sense of humour.  Be prepared to learn heaps.  And remember, it’s for charity – all profits will be donated to a global COVID-19 research charity.”

My topic is how to avoid data loss and corruption when importing text files into Excel. It is for the developer who needs to automate ways of importing text files from anywhere in the world with Unicode folder and file names, and content in any character set and encoding. And to correctly import data with any field delimiter and embedded linefeeds in quoted fields. It is based on other articles in this blog, collected together into one demo workbook.

Video timings:

00:00 Intro and the Windows file system object;
04:30 Reading file content as text;
Querying structured data – CSV and its many variants; in four ways:
11:00 User interface vs VBA open methods
21:05 Querytables
27:22 ADO Recordset
32:35 Power Query
42:00 End

Promo 1-minute video

While waiting for the organisers to correct the recording, here is a link to the full 42 minute video.

Better and safer ways to read text files in VBA



Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Event Tracing is three times faster than OutputDebugString

I use the OutputDebugString API in Windows to send debugging messages without the line limitations of the VBA Immediate Window. This is also useful when tracing the causes of crashes which kill the Excel process.


I use TraceSpy to capture the debug output. It is faster than DbgView which tends to bog down when there are many traces.

Event Tracing for Windows (ETW) is faster than OutputDebugString. Calling ETW from VBA takes 26 microseconds, three times the speed of OutputDebugString 77us. And WpfTraceSpy correctly displays Unicode strings like Cyrillic.

Simon Mourier has provided 32 and 64 bit VBA modules for WpfTraceSpy on GitHub.
I have contributed a module ETWModule3264.bas suitable for Excel add-ins that need to run without modification in either 32 or 64 bit version of Office.
This has been tested in 32-bit Excel 365 and 64-bit Excel 365 on Windows 10.
I work only in Excel, but it should apply to other Office products too.

https://github.com/smourier/TraceSpy/tree/master/vba

I noticed that the speed of the ETW output depends on the folder the xlsm is in. It can take as little as 1 microsecond for an ETW call, or as much as 26 as shown above.

I realised that the fast performing xlsm files are in Excel Trusted Locations. I wonder is Windows Defender sniffing at the untrusted API calls & slowing them down? In Trusted Locations, OutputDebugString takes 23 us, ETW takes 1 us. It is so fast because all EventWriteString does is push the event on to the Trace event queue and return, so there is very little overhead.

This could be an idea for API performance enhancement, if this applies to other Win APIs called from VBA.

So… make that “Event Tracing is twenty times faster than OutputDebugString” !

Posted in Uncategorized | Tagged , , , | Leave a comment

VBA Performance Profiler : timing procedure execution

This is an update and replacement for the post on 9th March 2017 titled ‘Free add in to profile VBA speed and coverage’. Updated 8-Jul-2021 with an example profile report.

Download XLTEST_VBA_Profiler.xlam.

(Updated 22-Jul-2021: Tracing P_.O also sends to OutputDebugString )

I use it to find out which procedures are taking the most time, so that I can target performance improvements to the subs which need it.

INSTRUCTIONS

Open the file XLTEST_Vba_Profiler.xlam and enable macros, or install it as an add-in.

Open the workbook you want to profile. Save it under a different name to avoid overwriting the original.

On the Developer tab, in the VBA Profiler group, click the button “Profiling”.

This gives a form with two checkboxes and a Help button (which shows the same instructions as on this page). The options are:

  1. Add a call to every line of code to trace its execution for coverage analysis.
    A tab-delimited text file is output of all the code after trace lines have been added
    so that the sub VBA_Profile_Reports can show what lines of code were executed.
    The text file is named from your workbook name suffixed with _VBA_COVERAGE.TXT
  2. Export all the VBA to a text file before making the changes.
    This is just for your record, it is not used in this process.
    The tab-delimited text file is named from your workbook name suffixed with _VBA_BEFORE.TXT

When you click OK, it adds a line to every procedure in the project, unless its declaration line contains a comment ‘@notrace
The inserted line looks like this, where 12345 is the line number in the project:


Dim P_ As Object: Set P_ = VBA_Profiler.New_Profiler(): P_.Start 12345, "modulename", "procname"

It only adds one line of code to each of the procedures in the code, and that line calls subs in the addin to do the timing and calculations. That adds a few milliseconds of overhead to each procedure call. For very small, trivial, and frequently called procedures, consider adding a comment ‘@notrace to its Sub or Function declaration to suppress the insertion of a profiling line of code.

It adds a reference to VBA_Profiler to the VBProject so that those inserted lines can call the functions to do the timing calculations.

Finally, insert a procedure in any standard module like this:

Sub RunProfiler
   VBA_Profile_Initialize [delimiter]

   Call your_top_procedure_name

   VBA_Profile_Reports
End Sub

If the optional delimiter is specified after VBA_Profile_Initialize , such as vbTab or “,”
every P_.Start sends the values passed to the debug output, separated by that delimiter.
You can watch and capture that output using DebugView from SysInternals or TraceSpy from Simon Mourier. I use TraceSpy because it is faster than DbgView and did not miss any output in fast tests. To be sure that TraceSpy can keep up with the output, do not use Auto-Scroll.

VBA_Profile_Reports outputs a tab-delimited file named from the workbook name suffixed by _VBA_PROFILE.TXT;
and if you do the coverage analysis, a workbook is created with one sheet containing the contents of the _VBA_COVERAGE.TXT file with an extra column added to show the number of times a line of code was executed.
The workbook is unsaved, you can save it as a .xlsx file if you wish to keep it.

Finally, go into the VBE and click Compile to check that no syntax errors have been introduced

When a procedure exits and the P_ variable goes out of scope, the profiler counts up the number of calls and the length of time the procedure took. The resolution of the timer is about a microsecond.

The visible methods in this add-in are:

Function New_Profiler() returns an object that collects timing data for the reports.

Sub VBA_Profile_Initialize() Call to initialise the profiler data collection

Sub VBA_Profile_Reports(Optional sProfileTSV As String , Optional sCoverageTSV As String )
Produces reports. The default file names are
For the Profile output: workbook name plus “_VBA_PROFILE.TXT”
For the Coverage input: workbook name plus “_VBA_COVERAGE.TXT”

The report columns are:
Module.Proc: Module name and procedure name
Count: Number of times the procedure was called
Inclusive time: Execution time of this procedure and all those it calls
Exclusive time: Execution time within this procedure only
Average time: The average run time of the procedure, the exclusive time divided by the count of calls
Index: A sequence number

If you lose the original workbook, you can remove the profiling code by doing a Search with Wildcards and Replace All with nothing for these Find What strings:
Dim P_ *
P_O *&:

Feedback is welcome!

Limitations: It does not handle Unicode workbook and module names, so use the Windows-1252 Latin character set only.

Assurance: It only works locally, and does not ‘phone home’ with any telemetry. It is supplied free and with no guarantee, in the hope that it provides some utility.

EXAMPLE

Consider a module with three procedures. TopProc calls SubA once. SubA calls SubB four times. TopProc takes 1 second with its own code, SubA takes 3 seconds, and SubB takes 2 seconds. This is what the profile report will look like:

Module.ProcCountInclusive timeExclusive timeAverage timeIndex
TopProc112.00000001.00000001.00000001
SubA111.00000003.00000003.00000002
SubB48.00000008.00000002.00000003
Example profile report

From this you see that:

TopProc is called once. The total execution time including its called subs is 12 seconds. The code in TopProc alone takes one second.

SubA is called once. The code in SubA takes three seconds. The total execution time including the eight seconds for its called subs is 11 seconds.

SubB is called four times. The total execution time is 8 seconds, so the average time per execution is 2 seconds.

Sort the table by Exclusive time descending to see which subs takes the most time overall. Sort by Average time to see which are slowest per execution.

To learn more about profiling, see these Microsoft articles:

https://docs.microsoft.com/en-us/visualstudio/profiling/instrumentation-method-data-views?view=vs-2017

https://docs.microsoft.com/en-us/visualstudio/profiling/caller-callee-view-instrumentation-data

Caller/Callee view – instrumentation data

Posted in Uncategorized | Tagged , , | 3 Comments

Querying Text/CSV files in VBA, method #3: Power Query

The module PowerQueryText.bas contains the code to query delimited files and return tables.

The advantages of PowerQuery are:

  1. It can handle comma and tab separated values and does not need a schema.ini as for ADO. You do however need to specify the separator character, see below for the details.
  2. Tab-delimited files do not have to have the extension “.tab” as they do for ADO, but can be named with a “.tsv” extension if that’s what you have.
  3. It can handle line breaks within quotes, or not, depending on the option you set. I always include line breaks within quotes.
  4. Its default encoding is 65001 which can handle all text encoding I have tested it with.  The UTF-16 (UCS-2) encoding is 1200.

In the ADO module, I determined the field separator by reading the file contents as bytes, and use a function CountColumnsFromBytes to infer the number of columns and their separator. In Powerquery, there is a function Binary.InferContentType that will offer statistics on possible delimiters, and your code can choose the most likely one.

At the moment, I have to send the output to a worksheet. If anyone finds a way to return the table data as a memory structure in VBA such as a recordset, let me know.

The functions contained in PowerQueryText.bas are:

testPQcsv    – a simple test stub that prompts for a file name and queries it. It calls the following functions.

InferContentType – needed to infer the encoding and field delimiter (separator) character.

TableProfile – returns statistics about a file to help determine the number of rows and columns.

PQMCsvFormula – a generic function to return the M expression for a delimited file.

PQDelimitedText – runs a query and outputs the result.

WBQueryToSheet – sends the data from a query to a sheet table range.

WBQueryToListTable – sends the data from a query to an Excel ListTable.

That is the end of this series on reading delimited text files in Excel VBA. I hope you find it useful, and let me know of any bug fixes needed, or suggest any improvements.

Posted in Excel/VBA, Power Query | Tagged , , , | 3 Comments