Querying Text/CSV files. Method #1: Querytable

You will know that simply opening a CSV file in Excel is commonly a poor way to do it. Excel has a habit of eagerly trying to interpret as a date anything that looks anything like a date. A previous blog post on this describes the frequency of such corrupted Excel data in genome research papers until they finally changed the names of the genes to avoid the problem.


The legacy Get Data from Text/CSV command in Excel creates a Querytable. This is pretty good for most purposes as long as you use the undocumented TextFilePlatform value of 65001. There is one glaring limitation: it cannot handle newlines in quoted strings. I get this frequently in Surveymonkey downloads where people have entered free text comments with multiple lines. So, the workaround is to pre-process the file to change any quoted linefeed (LF) or carriage return (CR) characters to spaces.

The module QueryTables.bas has the code to do that.

There is another way, using an ADO Query, which will be the subject of the next blog post. However, that has its own complications. For example, if the file is tab delimited you have to write a schema.ini file. The final way, in the last blog post, will be to use Power Query.

Advertisement

About Patrick O'Beirne, Excel/VBA developer

Patrick provides consultancy and training in spreadsheet automation, data analysis, testing and model review; and the Excel spreadsheet auditing addin XLtest
This entry was posted in Excel/VBA and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s