UK Covid-19 Track & Trace Excel snafu: Uncontrolled spreadsheets lead to data loss

The story broke around October 5th that Public Health England temporarily lost 16,000 Covid test results because they did not notice that the XLS data format they were using was insufficient to process the number of results they were getting. The risk is that lives could be lost because Covid cases were not being traced and followed up. The BBC report is probably the best summary:

“[The Covid testing labs] filed their results in the form of text-based lists – known as CSV files – without issue. PHE had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system.  The problem is that PHE’s own developers picked an old file format to do this – known as XLS. As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of. And since each test result created several rows of data, in practice it meant that each template was limited to about 1,400 cases. When that total was reached, further cases were simply left off.”

Most commentary piled on to the use of Excel at all as “inappropriate”. Wired has an interview with spreadsheet warriors on the Eusprig mail list on how they face daily support tasks like this. Maths comedian Matt Parker did a video on spreadsheet errors and on EuSpRIG.

I want to examine this story from the point of view of data and spreadsheet control and wonder about some unstated angles.

TL;DR: if they had applied basic data controls, they could have detected the problem as soon as it happened rather than lose precious days. Whenever data is being transformed from one system into another, there are many opportunities for data going missing, being duplicated, or corrupted. Whatever the skill level of the developer, nobody can know in advance what data might actually arrive in use and every possible failure mode. Any such system, whatever the technology used to implement it, should have at a minimum a simple check that the number of records output reconciles to the number input; and ideally other checks on data quality and hash totals.

First of all, why would they choose XLS? Possibly because it was the easiest choice for upload to their central system. For example, 32-bit SQL Server can import XLS (65536 rows) and XLSX (1M rows) files. For XLSX however they would have had to register the OLE DB provider Microsoft.ACE.OLEDB.12.0 and that may have looked like work for them and they thought that “we’ll never have more than 65K cases per day”. But although that choice of XLS might well be adequate (at the moment!) for upload, the fatal flaw was that the import into XLS would need many more lines. [Edit: I now think they really were using Excel 2003 or earlier.]

I’ve done work for labs and written apps in Excel and VBA to process the data logs from instruments. The article refers to them as CSV files and one might think that implies a text file with a header line with field names, one row per record, fields separated by commas, see RFC 7111.

However, the fact that only about 1400 cases fitted into 65K rows makes it look more likely that these are simple text files with one field and value per line. Or maybe it’s a combination of irregular numbers of fields per line, we don’t know. I would expect that depending on the exact number of lines per record, 46 or 47, the LAST record in the file would therefore be incomplete, some fields would be missing. So there are incomplete records in the database too.

Neither do we know what process they used to import the data in the first place – text file open, Get Data from Text, VBA Querytable, copy & paste, whatever. [Edit: I now think they used Workbooks.Open in a macro].

We don’t know what they did to transform the data from the lab output format into the regular table-like structure that an upload to a database would require. Maybe a mess of INDEX or VLOOKUP formulas; maybe a simple VBA macro that ran down the column of imported text and wrote out the samples in columns. Maybe they had to transform field values from codes into numbers and handle special values like “n/a”, “<0.001”, and such like.

They could have processed it in an XLSM with 1M rows but saved the normalised table as an XLS file, thus satisfying both requirements of processing a large number of lines and also saving as the most convenient format for upload; but they didn’t. Even sticking with an XLS file with macros, they could have read the lab file line by line, assembled each record in-memory, and write it out to the target sheet, without needing to populate a 65K sheet with imported data; but it looks like they didn’t.

But what puzzled me is this: I had thought that whatever method is used to import data in Excel, even in Excel 2003 with XLS files, the application would raise an error if you tried to import more lines; but there was no report of the problem being flagged by an error report, but by people checking the numbers.

If you use the menu to File > Open a text file in Excel 2003 with more than 65536 lines, or try to copy and paste from another application, it raises an error message “File not loaded completely”. If you import in any version of Excel into a worksheet in compatibility mode (ie 65536 rows) it would report ” The text file contains more data than will fit on a single worksheet”. So why did this not happen? (Spoiler: I discovered why later, see the postscript.) This was, in Sherlock Holmes terms, the dog that did not bark in the night.

I tried to imagine scenarios involving XLSWriter or Apache POI trying to create XLS files with more than 65K rows; I worked out an unlikely scenario involving copying to the clipboard from another application and in VBA using Worksheets(1).Paste which does not raise an error, it truncates the data. Other Excel experts suggested that the PHE developers simply suppressed Application.DisplayAlerts or On Error Resume Next, or even deliberately imported line by line with a cut off at 65535. They point to a variation of “Hanlon’s razor”: “Don’t ascribe to obscure complexity what can be explained by incompetence”, and to the collection of reports on thedailywtf.com and theregister.com. And Arthur Conan Doyle wrote “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth”. [Edit: Another conclusion, which I thought improbable at first, is that this public health service is really so starved of IT resources that they have to use unsupported versions of Excel more than 13 years old. Update: this can happen even in modern Excel, see below for 25-Nov-2020]

However, it was much simpler than that, and only after trying the most obvious thing which I missed the first time I posted this, I found it: VBA Workbooks.Open “file.txt” will load as much data as it can WITHOUT raising any error. No On Error Resume Next needed. So this is in my view an Excel failure, a defect, because the VBA behaviour is not consistent with the UI behaviour. Still, you should not assume your technology will always work perfectly and fail to put in controls around your processes.

More like Occam’s razor really: pick the simplest explanation over the more complex one.

[Update 14-Oct-2020: what I now think really happened]

Update 22-Oct-2020: I found a Sky News article which gave a bit more information:

“The problem is while the Second Generation Surveillance System (SGSS), as it’s called, is plugged into the existing pathology labs that pre-date COVID-19, for some reason no-one has yet managed to connect it to the new labs which have been created during the course of this pandemic. […] Rather than feeding its results securely straight into SGSS, as Pillar 1 pathology labs do, the test results from Pillar 2 labs arrive at PHE in the form of a CSV file.” So Pillar 1 are using secure transfer, maybe in HL7 format, but the new system masterminded by consultants at £7000/day uses CSV, like the USA CDC?

Update 25-Nov-2020: I found two more scenarios where using an XLS macro in Excel 2016 silently drops data above the sheet capacity of 65536 rows. If I use in VBA Querytables.Add “TEXT;file.csv” or an ADO Jet.OLEDB.4.0 query “SELECT * from file.csv” in an .xls file in Excel 2016 to populate a worksheet in the xls file from a CSV file with 89000 rows, it stops at 65536 silently, ignoring extra data, no error. It’s quite likely that xls macro files persist in use even after the version of MS Office Excel is updated. So there’s another couple of ways in which it could have happened. There may be more!

Posted in Uncategorized | 1 Comment

Nested JSON to flat data tables

Have you ever had to convert a file of JSON text to Excel tables? Not simple JSON, but one with a collection of dictionaries inside a dictionary inside a collection of dictionary records inside another collection of records inside a dictionary? And you want all this as old-fashioned normalised database tables? And in VBA? Here’s how I did it:

Download: Flatten JSON 20200915.xlsm

I chose as an example a Surveymonkey survey_details JSON file. I use VBA Collections, ADO.Recordsets, Scripting.Dictionary, a free class to parse JSON in VBA and some code to handle the inevitable exceptions. It’s a two pass process; one to identify all the fields in all the dictionaries, create the linking primary and foreign keys, and the second to populate recordsets with the data for those fields and write them out to Excel sheets. Or of course you could write them to a database.

If you like YouTube tutorials, here’s a recording where I walk through it:

What we start with:

What we end with:

(Update 23-Nov-2020: fix misspelled download link)

Posted in Uncategorized | Tagged , , , , | Leave a comment

VBA Read Unicode file contents in various encodings

To read a text file, you need to be able to handle more character sets than just ANSI. Not just in the contents but also in the file and folder names.
The classic article on this is by Joel Spolsky:
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Here are folder and file names in various languages:

F:\DOCS\Arabic\التاريخ المالي والنسب.xls
F:\DOCS\Arabic\اسم ملف الاختبار\اسم ملف الاختبار.xlsm
F:\DOCS\Chinese\2016 全 日 制 ⑴ 班 2016 ~ 2017 学 年 度 第 一 学 期 MBA 教 学 计 划(教室:2-111).xls
F:\DOCS\Chinese\測試文件名\測試文件名.xlsx
F:\DOCS\Greek\όνομα αρχείου δοκιμής\όνομα αρχείου δοκιμής.xlsx
F:\DOCS\Hebrew\שם קובץ הבדיקה\שם קובץ הבדיקה.xlsx
F:\DOCS\Hindi\परीक्षण फ़ाइल नाम\परीक्षण फ़ाइल नाम.xlsx
F:\DOCS\Hungarian\kalóriaguru Munkafüzet.xlsm
F:\DOCS\Hungarian\kalóriaguru\kalóriaguru.cls
F:\DOCS\Japanese\テストファイル名\テストファイル名.xlsx
F:\DOCS\Korean\테스트 파일 이름\테스트 파일 이름.xlsx
F:\DOCS\Russian\Лист1.xlsx
F:\DOCS\Russian\имя тестового файла\имя тестового файла.xlsx

Misinterpreting characters results in accented characters appearing as two garbage characters; or you see a white question mark in a black diamond which indicates that UTF-8 encoding could not figure out what it was.
https://en.wikipedia.org/wiki/Mojibake

In the case of VBA, the classic Open statement, and the functions Dir(), Filelen(), FileDateTime() cannot handle Unicode file names and folders. The Filesystem Stream object can read Unicode names but cannot handle UTF-8 encoding. So, I use the ADO.Stream object in the module ReadFileContents.bas to handle a number of different cases. The variations are not complete but it handles the file types I get.

Let me know of any improvements you can suggest.

Here is the function:

' Reads the entire contents and tries to GUESS the character encoding
Function ReadFileContentsAndCharset(ByVal FileName As String, ByRef Text As String, _
         ByRef charset As String) As Long
   charset = "UTF-8" ' first guess. If it's not UTF-8 it may still be readable as UTF-8
   ReadFileContentsAndCharset = ReadADOStreamText(FileName, Text, -1, charset)
   If ReadFileContentsAndCharset = 0 Then ' it succeeded without error
      Debug.Print FileName   ' comment these out in production
      Dim lUTF8Len As Long, sBOM As String
      ' UTF-8 files do not have to have a BOM so no point in testing for that
      sBOM = Left$(Text, 2)
      lUTF8Len = Len(Text)
      If sBOM = Chr$(254) & Chr$(255) Or sBOM = Chr$(255) & Chr$(254) _
         Or InStr(1, Text, Chr(0)) > 0 _
         Then ' UTF-16 or nulls? Might be double-byte character set
         charset = "UTF-16"
         ReadFileContentsAndCharset = ReadADOStreamText(FileName, Text, -1, charset)
         Debug.Print charset & " len=" & Len(Text) & ", utf-8 len=" & lUTF8Len
         If lUTF8Len = Len(Text) * 2 Then
            'ReadFSOStreamText(FileName, Text) would work here too for UCS2
            Debug.Print "As expected, double byte character set"
         End If
      ElseIf InStr(1, Text, ChrW$(65533), vbBinaryCompare) > 0 Then ' &HFFFD
         ' Check for U+FFFD <question mark in black diamond> replacement character
         ' used to replace an unknown, unrecognized or unrepresentable character
         'https://en.wikipedia.org/wiki/Specials_(Unicode_block)
         charset = "Windows-1252" ' second guess; superset of ISO-8859-1
         ReadFileContentsAndCharset = ReadADOStreamText(FileName, Text, -1, charset)
         Debug.Print charset & " len=" & Len(Text) & ", utf-8 len=" & lUTF8Len
         If Len(Text) > lUTF8Len Then
            Debug.Print charset & " converted some characters into multiple characters"
            ' so it must be some other encoding. No idea.
         End If
      End If
      Debug.Print Len(Text) & " chars, charset inferred=" & charset & " " & FileName
   Else
      MsgBox FileName & vbLf & "Error " & Err & " " & Err.Description
   End If
End Function

Function ReadADOStreamText(ByVal FileName As String, ByRef Text As String, _
   Optional ByRef numchars As Long, Optional ByRef charset As String) As Long
   ' initialise default parameters
   If numchars = 0 Then numchars = -1
   'numchars Optional. The number of characters to read from the file, _
      or a StreamReadEnum value. Default is adReadAll=-1
   If Len(charset) = 0 Then charset = "utf-8"
   With CreateObject("ADODB.Stream")
      .Type = 2 'adTypeText = 2 Specify stream type -  text/string data.
      .charset = charset
      .Open
      On Error Resume Next
      .LoadFromFile FileName
      'Err  3002         File could not be opened.
      Text = vbNullString ' in case of err
      Text = .ReadText(numchars)
      ReadADOStreamText = Err
   End With
End Function

Posted in Uncategorized | Tagged , , , | Leave a comment

A look back to Eusprig 2019

The 20th conference of the European Spreadsheet Risk Interest Group was held in  Browns of Covent Garden, London on 11-12 July 2019,

Thursday 11 July 2019

Developing Excel Thought Leadership

David Lyford-Smith, ICAEW, Chartered Accountants’ Hall

This described the development of their published booklets on Twenty Principles, the Spreadsheet Competency Framework, and Financial Modelling Good Practices.  David summarised their approach as:

  • We have a model for developing spreadsheet thought leadership:
  1. Identify an area where a high-level guidance publication could be of use by consulting with experts and stakeholders
  2. Convene a suitable and diverse group of experts and synthesise a consensus view from their disparate opinions
  3. Invite commentary and critique via an exposure draft process
  4. Complete the project based on the feedback received
  • With such variation in opinions and practice, building in room for individual preference and differences in approach must be an explicit consideration
  • Many people are willing to follow good practice when exposed to it, but few will seek it out for themselves

Practical aspects of applying an End User Computing policy

Roger Turner, Wesleyan Assurance Society, Birmingham

This was an update on his 2018 paper where he described the work Wesleyan did to set up its EUC policy. There was a continual focus on meeting and working with managers to be sure that actions were completed. His slides give a very frank account of their journey which will be very valuable to those present working on similar projects. Their fine-grained approach to classifying risks made it easier to score some quick wins, on things like Controls, Validation, and Documentation. Some slides showed their KPIs to measure progress on moving spreadsheets from red to green status. His learning points are:

  1. Plan what you’re going to do
    1. If you fail to plan you plan to fail
    2. Work out risk metrics, use best practice to make an EUC policy, do pilot runs to iron out problems.
  2. Get buy-in right from the top
    1. Identify the risks and benefits – suggest a risk based approach tackling where the biggest risks are first.
    2. Success with these will give your project momentum which you will need for harder areas
  3. Use of expensive software can usually be avoided
    1. Look to re-use existing infrastructure and systems
    2. A top down approach can get some results and is cheaper, less intrusive than a bottom-up approach
  4. Publicise the policy with its benefits (intranet, meetings etc to make the approach clear)
  5. Be sympathetic with peoples’ other business priorities
    1. Most will try to help and this will give you momentum
  6. Use KPI reporting – this will show how your EUC project is reaping benefits over time.

A Case Study of Spreadsheet Use within the Finance and Academic Registry units within a Higher Education Institution

Simon Thorne and Jamie Hancock

This was a survey of the Academic Registry and Finance departments of a university. The paper has many tables giving an analysis of the responses to the survey, although bear in mind that this is self-reporting. They identified the common risks:

  1. Lack of training, standards, development methods, testing and documentation in spreadsheet development
  2. Mistakes in communicating with third parties
  3. Potential for fraud and falsification
  4. GDPR risks

The mitigating actions that follow are:

  1. Generate an inventory of active critical spreadsheet applications and take appropriate mitigating actions with the riskiest spreadsheets.
  2. Development of a robust spreadsheet development policy governing approaches to development, deployment, documentation, testing and accountability
  3. As part of the organisational policy, decide on development methodologies, documentation standards and testing policies for spreadsheets
  4. Identify and deliver appropriate training in planning, development and testing of spreadsheet models
  5. Establish peer working, user groups and risk awareness sessions

Abstracting spreadsheet data flow through hypergraph redrawing

David Birch, Nicolai Stawinoga, Jack Binks, Bruno Nicoletti, Paul Kelly

Imperial College London, Filigree Technologies, London

This represents the spreadsheet as a network diagram of its dependencies. They make the link structure explicit by transforming the spreadsheet into a graph structure. They find groups of cells performing near-identical operations by detecting vector operations using isomorphic subtree detection. They illustrate the effectiveness of raising the abstraction level by translating a series of spreadsheets to a high-level data-flow language called ModL. By applying a force-directed algorithm they can rotate the graph in a virtual 3D space to better see the relationships.

More established technologies such as Quantrix are specifically aimed at the financial modelling market and start with such structures as their design surface.

Implementation Strategies for Multidimensional Spreadsheets

Paul Mireault, Founder, SSMI International, Honorary Professor, HEC Montréal

Last year, the author issued a challenge to spreadsheet users to create a spreadsheet model with some initial data and simple formulas to allocate quantities and money over four dimensions. There was no challenge in the formulas, they were given them all, it was simply a request to implement a model. Even with that, of the 109 who responded, only 17 completed the challenge. Some implemented it as tables which spread the four dimensions over multiple blocks of formulas in multiple worksheets. Others collated the data in a flat database structure and created the reports as simple summaries of the data. His conclusions are:

  1. Participants using the Database approach usually have the simplest formulas for aggregated variables. We expect that they will be the easiest to maintain.
  2. The analysis of the Multi-dimensional Spreadsheet Challenge submissions has given us insight on the different implementation strategies used by seasoned spreadsheet developers. Given the same problem and its solution, developers had very different results. This lack of standardization may explain the why it is hard to understand and maintain somebody else’s spreadsheet.

EQUS – helping to see formulae

Chris R. Roast, Sheffield Hallam University

EQUS is a C# Excel addin that presents a graphical breakdown of cell formulas annotated with an evaluation of each subexpression and the calculation steps. It is like a more colourful, simultaneous, presentation of what the Formula Evaluation tool in Excel does; or other formula reference tree viewers in the Excel addins market. This is intended for educational use and is very suited to explaining to students how the formulas work.

From webtables to datatables

Mária Csernoch, University of Debrecen, Hungary

The motivation for this work is to create realistic and interesting tables of data for school students to analyse. Web sites are an expected source, but unfortunately many frustrate the ability to conveniently collect data from their pages. Standard web queries depend on data in HTML tables, but many use <div> or other tags which can not be read by Get Data from Web, or Powerquery. She presented the steps they must use for copy/paste or saving pages as HTML or plain text and opening in Excel or Word; then reformatting, editing, cleansing, parsing, and reshaping to get a usable table out of the result. It can be a lot of tedious work. While professional developers may use VBA and APIs, this aims to create reproducible manual steps which schoolchildren can use. Be aware that these are Hungarian schoolchildren who have already been taught Excel array formulas.

Will Dynamic Arrays finally change the way Models are built?

Peter Bartholomew , MDAO Technologies Ltd

The author is an engineer, not a financial modeller. His kind of task for the MoD was a low-fidelity Excel model of the likely interaction of warship and helicopter in the North Atlantic, with a graphical readout of the behaviour. His models are largely constructed with names rather than cell references, so have always been innately array-oriented. He points out that Excel 365 dynamic arrays now provide the most direct route to a solution

instead of requiring effort to set up Control-Shift-Enter (CSE) objects; any formula can be an array calculation, any cell can hold an entire result array.

The surprise for anyone versed in traditional  methods may well be how few cells contain formulas, so consistency errors are simply not possible and having fewer formulas reduces the opportunity for error.

Cubes an alternative to spreadsheets to mitigate risk and improve performance?

Marc BRAUN, Xcubes

This was a demonstration of XCubes, a Multidimensional Spreadsheet System software package. He used the model specified in Paul Mireault’s talk as an illustration. An Excel solution required 1,427 formulas; an XCubes solution just 13. Cubes are the most suitable solution when the model can be broken down in blocks to store well-structured data. They make analysis, calculation and visualization even on large data sets much easier. The solution can be downloaded from xcubes.net.

Does the teaching of spreadsheeting skills have the potential to be a significant global education policy intervention?

Morten Siersted, Visiting Research Fellow, Centre for Development Studies, University of Bath

Spreadsheeting skills do not currently feature within global education policy in any significant way.  However within the UN Strategic Development Goals (SDGs), the proxy of ‘ICT skills’ is of direct relevance to Target 4.4 ‘Skills for work’ and has led UNESCO to propose ‘using basic arithmetic formulas in a spreadsheet’ as a component of an ‘ICT skills acquisition’ element of measuring progress on SDG4.4.

Although there is a shift from superficial surface level ICT skills to deeper cognitive skills and computational thinking, he sees a risk of policy makers jumping straight from ICT skills to traditional computer programming education, thereby missing out the ‘spreadsheeting as entry-level programming’ stepping stone. ICT for Development is now mainstreamed within key UN agencies (ITU, UNCTAD, UNDP, and UNESCO). He concludes that an initiative should be undertaken to disseminate existing spreadsheeting research to education policy makers working with low-income countries.

 

Friday 12: Training Sessions

Working through Modeloff example problems

David Lyford-Smith, ICAEW

David worked through two games from the Modeloff questions resources as an example of problems that require no domain knowledge but are capable of solution by Excel. Indeed, the drive for more efficient solutions often captures the attention of Excel nerds beyond the point of return on time spent.

Dynamic Arrays in Excel

Charles Williams, Decision Models Ltd

Charles worked through a presentation created by Joe McDaid of Microsoft which explained how the new dynamic arrays work and also, illuminatingly, how the legacy array formulas and their implicit intersection rules really worked! This is the video from Microsoft:

https://community.powerbi.com/t5/MBAS-Gallery/Microsoft-Excel-Advanced-spreadsheet-modeling-using-Dynamic/td-p/726918

Powerpoint slides:

https://medius.studios.ms/video/asset/PPT/MBAS19-BRK4001

Slide 32 (Array Eval) has a typo, the word “didn’t” should not be there. Slide 42 (Calculation Modes) table is blank in edit mode but fills in by animation in presentation mode. The content can be seen in this tweet from Charles’ presentation: https://twitter.com/ExcelAnalytics/status/1149633726592114688

Patrick O’Beirne  16 July 2019

Posted in eusprig, Excel/VBA, Research, Risk, Uncategorized | Tagged | Leave a comment

A look ahead to Eusprig 2019

The 20th annual conference of the European Spreadsheet Risk Special Interest Group (http://www.eusprig.org) will be held on July 11 & 12 at The Barrister’s Court, Browns, 82-84 St Martin’s Ln, Covent Garden, London WC2N 4AG, UK.

David Lyford-Smith of the Institute of Chartered Accountants in England and Wales (ICAEW) will review the history of their three “thought leadership” papers on good practice in spreadsheet use and spreadsheet work environments.  Anyone involved in spreadsheet training, teaching, and syllabus development will be interested to hear how the process of making them has helped ICAEW to develop its position. David also features in the Friday morning tutorials.

Last year Roger Turner of Wesleyan Assurance Society presented a high-quality paper which described their plan to address EUC risks, which the audience found very relevant. This year he describes the challenges which were faced and how these were overcome. I am particularly looking forward to this practical presentation.

During the year, Paul Mireault of SSMI International invited seasoned Excel developers to participate in a challenge to implement a forecast modelling spreadsheet with four variables – product, region, sector, time. He will report on the different implementation strategies employed. Having tried it myself, I can say you will see some radically different approaches to the challenge! Expect arguments from the attendees that “my way is the best because….” !

Simon Thorne and Jamie Hancock  of Cardiff Metropolitan University will present the findings of a case study of spreadsheet use in a higher education institution in the UK. The implications of the results are then considered in terms of accurate reporting to external funding bodies such the funding councils, internal data integrity and internal data efficiencies.

Mária Csernoch of the University of Debrecen, Hungary, will talk about how school pupils can scrape data from web pages rather than type it in for their Excel exercises. Tables and table-like structures on webpages are excellent sources for teaching spreadsheeting by utilizing and developing knowledge-transfer items, presenting various real world problems and solutions, discussing and debugging, and in general, developing computational thinking skills.

Chris R. Roast of Sheffield Hallam University will describe the design development and evaluation of an Excel add-in for interactive visualisation of spreadsheet formulas (EQUS). The development process was one of iterative refinement engaging a initial target audience of mid-teen learners, involving re-design and formative evaluation. The resulting visualisation techniques have been found to be broadly relevant to spreadsheet users beyond the initial target audience.

David Birch and others from Imperial College and  Filigree Technologies, London, expose hidden linkage structure by transforming spreadsheets into fine-grained graphs with operators and values as nodes. They propose that researchers should seek techniques to redraw these boundaries to create higher level structures which will more faithfully represent the end-user’s real world/mental model.

Peter Bartholomew of MDAO Technologies Ltd will discuss the thesis that the adoption of the new Dynamic Arrays in Excel within a more professional development environment could replace traditional techniques where solution integrity is important. This will be a useful taster to Charles Williams’ tutorial on Friday morning.

Marc BRAUN of Xcubes argues that a cube-based system offers a structured and scalable approach to modelling, forecasting, reporting and data analysis with little manual intervention once the model is set up. A practical demonstration will be made with XCubes, a Multidimensional Spreadsheet System software package.

Morten Siersted, visiting research fellow at Bath, presents his dissertation on “Does the teaching of spreadsheeting skills have the potential to be a significant global education policy intervention?” Anyone involved in promoting EUC skills will be interested in hearing his views on the relevant ‘ICT for development’ frameworks.

The innovation this year is a pair of tutorials on Friday morning. David Lyford-Smith will be illustrating good spreadsheet structuring by working through Modeloff example problems. Charles Williams of Decision Models Ltd, the Excel MVP, will be explaining Dynamic Arrays in Excel, their advantages and their wrinkles.

I’ll be there!

Register at http://www.eusprig.org

 

 

Posted in eusprig, Excel/VBA, Research, Software Quality, Software Testing | Tagged , , | Leave a comment

UK HMRC MTD OAuth2 Excel/VBA demo

I’m making available as open source my VBA routines for the “Hello World” tests on the UK HMRC “Making Tax Digital” (MTD) web service which use oAuth2 and JSON. Thanks to posts from Bruce McPherson (Excel Ramblings), Tim Hall (Excel-REST), and Kyle Beachill.

Link to Excel workbook

HMRC publishes more information on Making Tax Digital
https://developer.service.hmrc.gov.uk/api-documentation
Navigation menu at the top offers: Documentation, Applications, Support
Checklist of things to do on HMRC website in order to complete these exercises
1. Read Getting Started
2. Register for an account and sign in
3. Read the Testing instructions
4. Add an application
5. Subscribe to the APIs you want. Links to test subs in VBA
Hello World API Open Endpoint
Create Test User App Endpoint
6. Get Sandbox credentials from the Application page
and paste into the fields on the Parameters sheet Application Parameters
The parameters ending in _uri are those currently defined by the HMRC API.
7. To test user-restricted endpoints, you need to create one or more test users.
The Create Test User Service (see picture on the sheet) does it manually.
Copy the generated data into the Parameters sheet at “Generated Individual”

If you’d like to do real work with this, such as submitting VAT or PAYE returns, contact me to discuss a specification and contract.

 

Posted in Excel/VBA, Open Source | Tagged , , , , , | 1 Comment

MS Azure Cognitive Services V3 API Translator VBA

I have now updated my free open-source VBA code for the MS Translator API V3, which uses JSON data formats rather than the previous XML. There is no change in your subscription method, you can continue to use the same Azure Account Key.
To use this code:

1. Download the ZIP file (13K) and unzip it into a folder.
(Updated 9-May-19 ClsJSONLib.cls)

2. Read the README.TXT file.

3. Run the Sub testMicrosoftTranslate() to verify that the code works

Comments and questions welcome. If you would like some tailoring to your specific requirements, or indeed any Excel VBA development, contact me to discuss a specification and quotation.

Posted in Excel/VBA, Open Source | Tagged | 44 Comments

Extend Excel: London 18 Oct

Charles Williams is organising  a conference in London on October 18th, 2018. I’ll be there! This is a rare chance to meet leading developers from all over the world and hear directly from MS on Excel Extensibility. Expect to be seriously stretched. It’s a community event, not for profit, so the price for corporate-funded attendees and vendors is only £120 including VAT.

More information & booking: https://developexcel.wordpress.com

  • Charles Williams (UK) – Excel MVP:  Overview of extensions, APIs and Tools
  • Juan Balmori (USA) – Senior Program Manager, Microsoft Excel Extensibility:  Javascript APIs, UDFs and UI elements
  • Govert van Drimmelen (SA): Excel-DNA: .Net for Excel
  • Tony Roberts (UK): PyXLL: Python for Excel
  • Bruce Belson (AUS): Planatech XLL Plus: C++ for Excel
  • Gašper Kamenšek (SL): Power Query + M in VBA
  • Simon Murphy (CH): VSTO, Addin Express and other Dev Tools: real world experience

Network with, socialise and meet other Excel developers of addins and solutions

 

Posted in Excel/VBA, ExcelDna, Open Source, Productivity, Research, Software Quality, Software Testing, Uncategorized | Tagged , | Leave a comment

Excel and PowerPoint 2016 for Mac gripes

I’ve been trying to maintain compatibility between Mac and Windows versions of Excel/VBA apps for a while now, and it’s a pain.
These notes refer to versions 16.14 of both Excel and Powerpoint.

Here are some of my recent threads on answers.microsoft.com

Passing a user-defined object to a sub As Object passes an object with all Null properties. Passing As MyClass works. Methods work, just not properties.
Interoperability between Excel 2016 for Mac and Powerpoint 2016 for Mac is very frustrating.
To get VBA to copy a picture of a chart to a slide, I’ve tried four different ways.
Only in the latest Insider Build 16.14 does the obvious Copy / Paste method work but even that requires copying and pasting as an image in Excel first and then copying amd pasting the static picture.

1) Manual copy and paste works, as it does in Windows. I want to automate this for what could be 100 slides.

2) The CopyPicture method fails for all objects. Range gives error 1004 CopyPicture method of Range class failed. ChartArea gives error 438 Object doesn’t support this property or method. Chart.Copypicture the result when pasting is a shape with text “The picture can’t be displayed”. (Update 8-May-18: Excel 16.14 (180507) Chart.Copypicture now works, includes a TIFF format in the copied data. And to save a picture of a range as a file, we can now do range.copypicture followed by adding a temporary chart by chartobjects.add and then chartobject.chart.paste of the picture, then chart.export as PNG)

See my post on answers.microsoft.com.
To work around this, create the picture in Excel:

 oChart.ChartArea.Copy
 ' ws is the target sheet for the picture
 ws.Cells(30, 9).Select 'below the chart; to anchor the Paste
 ws.PasteSpecial 'default Format Link:=False,DisplayAsIcon:=False
 'Format: 0=PNG,1=JPEG,2=GIF,3=Enhanced metafile,4=Bitmap,5=DrawingObject
 ' but UI offers only PDF/TIFF/Drawingobject
 Set oShp = ws.Shapes(ws.Shapes.Count)
 oShp.Copy

 

3) There is no Slide.PasteSpecial method in Mac Powerpoint VBA. Slide.Paste takes NO data type argument.

4) ApplescriptTask can be used to execute the Paste in Powerpoint. Thanks to Ron de Bruin for help here.

This code is in the VBA:

    myChartObject.Copy
    AppleScriptTask "CopyPasteScript.applescript", "PastePowerpoint", ""

Save this code in a file eg “CopyPasteScript.applescript” in the special folder
/Users/myusername/Library/Application Scripts/com.microsoft.Excel

    on PastePowerpoint()
    -- Paste into Powerpoint
    tell application "Microsoft PowerPoint" to activate
    delay 0.1
    tell application "System Events"
    tell process "microsoft powerpoint" to keystroke "v" using command down
    end tell
    end PastePowerpoint

5) Another method is to export the Chart as a picture and then import it in PowerPoint

    Dim oChart As Chart, sFile As String
    Set oChart = ThisWorkbook.Worksheets(1).ChartObjects(1).Chart
    oChart.Parent.Activate ' Chartobject
    sFile = ThisWorkbook.Path & Application.PathSeparator & "test.png"
    oChart.Export sFile, "png" 
    ' Only works on Excel for Mac 16.14 or later; 16.9 to 16.13 produces zero bytes file
    oSlide.Shapes.AddPicture(sFile, False, True, 185, 110, -1, -1)

 

6) In practice, the above method is agonisingly slow, so what I really do is write a VBA script with lines that create a presentation, add slides, set the title text, and call AddPicture using the file names created.
Then I create a temporary presentation in PowerPoint, import that VBA code, execute it, and close the temporary presentation without saving it.

7) Before Chart.Export started to work in Excel 16.14 or later, I did try the ugliest workaround, and it works but it’s really a kludge.

7.1 Create a temporary workbook with a sheet for each slide and copy/paste the title and chart image into it. Add needed properties to the .AlternativeText of the pictures.

7.2 Save the workbook as html.

7.3 Open the tabstrip.htm file, read the text, split it at “<a href=” and extract the tab names.

7.4 for each tab file name, open its htm file, read the text, split it at “<v:shape id=” and parse out the picture file names and properties.

7.5 Now I have the picture file name, add a line to the PowerPoint VBA file I create in (6) above to add a slide using the right title and picture filename.
Horrible, isn’t it?

 

Posted in Excel/VBA, Mac, powerpoint, Uncategorized | 6 Comments

Dublin Tech Summit 18-19 Apr 2018

#DTS18 @dubtechsummit

Dublin Tech Summit is a two day (Wed 18 and Thu 19 April 2018) meetup of startups and business people intended to connect investors to tech companies, and to highlight emerging tech trends.
I attended as a guest of Failte Ireland, and brought two UK attendees to it, to help promote Ireland as a conference destination.20180418_141020,400x300

Several side tracks dealt with start-ups, mentoring, and Eir offered a prize of €5,000 for the best startup pitch. This looks like a good event for any tech company at launch or early development. If you can’t afford it, maybe (in Ireland) talk to your LEO (Local Enterprise Office) for support, or the equivalent in other countries. For example,  La French Tech, an initiative of Business France, featuring Kuzzle, Axione, Qare, and Keymetrics. I saw a number of other stands from French companies; one that caught my eye was Myoeno.com, a wine scanner device to measure oenological characteristics.

My opinions (and bear in mind I’m probably not its target market):
It is not a training event, there were no real techie or “how to” presentations, but an educational / product promotional event in a general sense loaded with “fantastic / incredible / amazing” buzzwords and hype. I understand that slides will be available later.
The schedule only gave speaker names and very few descriptions of what the talks were about. This did not improve as the event grew closer, so it became clear that speakers were chosen for being able to give engaging and fluent presentations, and not for promising any useful content. The general sense is of watching an animated magazine article. The panel sessions featured experts, including local ones such as Brian Honan, but the limited time meant that topics were necessarily high-level discussions rather than informative.

Quantum Computing, Bo Ewald, D-Wave systems: because many terms were used without definitions or examples, like “annealing”, this would have made more sense to me if I had known more about the topic, but the 2017 story of China communicating with a satellite using “entangled” photons was interesting.

Patricia Scanlon of Soapbox Labs described the particular challenges of recognising children’s voices.

The Publishing Revolution session spoke of developments in eBooks and interactive learning and self-publishing.

Fred Laluyaux of Aera spoke of the “self-driving” enterprise. Their solution to integration and interoperability of 140 separate IT systems is to introduce a 141st and link every other one to it to create a “digital twin” to capture corporate memory. Maintenance, anyone?

The MusicXTech events were in large nearly empty rooms, where other tracks were packed out.

20180418_140931,400x300
One MusicXTech session interested me, on “The Visualisation of Music” but the schedule had no information on the speaker and no abstract. I tweeted the organisers asking for more info but got no reply. I went anyway but it was nothing to do with the visualisation either of musical or music notation, but rather a look back of how the early augmented reality scenes in the Gorillaz ‘Humanz’ app were created by B-Reel and how it’s much easier now. A bit of pop history, maybe.

The Casey Neistat session was entitled “How to be heard” but he gave no “how to” information at all – in fact, disclaimed it. It was simply the story of his YouTuber career and probably interested people in media.

Jason Zander of Microsoft described the Azure eco system – he began by saying that at a dev conference the first thing he’d do is launch into Visual Studio so that sounds more like my kind of event.

At the panel on Bitcoin, an audience show of hands showed that more than 50% claimed to own Bitcoin. They expect the performance problems to be addressed someday – currently BTC handles 7 transactions per second but Visa handle 24,000 tps.

A session entitled “The Future is Dominated by Customer Experience​” sounded promising but again came across as a rehash of fairly obvious “customer first” assertions. It might have been of more interest to discuss why this does not actually happen in practice, what the motivations are that result in CX failure. Like the app for this event.

Jordan Bitterman of IBM described current applications of AI. He gave examples like Shazam for song recognition, “Do not pay” to challenge parking tickets, and voice recognition identifying which adolescents showed mental health risks. He also reported on using location data to infer possible forced migration of refugees before they are washed up on shore.

The Grip App for the event was my Gripe App because it was useless for planning my attendance. It had networking features but nobody I clicked as “interested” responded, perhaps because of the thousands of attendees, the few they chose to show me were not particularly relevant. They trumpet the “AI” aspect without saying what that means – probably just dumb text matching between stated interests. There is no search option in the Grip app. Well, there is but when I typed in the name of a speaker I was interested in hearing I got one result dated 1st April 2016 and no further info.
The only way to view the schedule is to select one track, then one day, then scroll down, tap the down arrow beside each session title in turn to see who the speakers are and any description, read it to see if it’s what you want, and if not then scroll down more and repeat. And repeat all that for the next day. And repeat all that for each of the four tracks. What a crock.
If anyone designing the app had much experience of attending a conference they would know the top task is to select which sessions to attend in which tracks.
I need an overview of the whole schedule with detail of all the talks. If I tap on a speaker, I want to see what sessions they are in and when and where.
There is a grid overview offered but most of the time it does not download. (I subsequently learned that was due to demand throttling.) And when it did, it was view-only in the reader, it could not be saved on my device so I would not have to download again. The smart solution is to print the schedule yourself before coming.
Or as some do, take a picture of the session displays on the screens outside the rooms. And do it while it’s on show – part of the time it wipes away and come back, for no discernable reason except to frustrate the reader.
A simple backup system for the organisers would have been to give everyone an A4 sheet with the parallel sesssions listed, to enable planning.
On the second day I took the time to scroll the schedule early morning and write down my choices on paper.

 

Posted in Uncategorized | Leave a comment