Excel as a Database: What Else Can I Do With It?

When you are tasked with building bots which will process really large spreadsheets, what would you do?

Would you use the Excel Action Package and wait for the bot to loop through each record?

Or would you dump that task onto the junior developers and guilt trip them for not taking this as a “learning experience”, in case they rebel?

This has happened to me on several occasions and yes, it was quite the learning experience.

I learnt to never trust my seniors.

No but seriously, how would you go about developing this bot to process copious quantities of data?

You interact with Excel like it were a Database!

If that didn’t ring any bells, then you need to go through the introduction before reading any of this.

I’ve used them in few projects, and would have been lost without them. Not only was I able to complete my project ahead of time, my manager even scolded me for it.

Apparently my work was not limited to whatever was specified in the Statement of Work, but also whatever the client decides to squeeze in after the SOW sign off.

I’m just as confused as you are.

I have covered basic operations like connecting to Excel and CSV, filtering data and referencing Columns by their names and/or positions in that article.

Today we will learn advanced operations like manipulating data from various sheets, or CSV files and combining them into a single consolidated unit ready for export.

So without further ado, lets join hands and welcome JOINS!

Get In Touch With Your Inner Join

Being selective about the columns you want from a given table is all fine and dandy, but how about being selective about columns from tables spread across different sheets and files?

How will you make the connection between various sheets or files?

Is there something we can use to “join” them together?

Why yes, yes there is.

Do keep in mind that the Database Operations performed on Excel use Access Engine Syntax, so do keep in mind that you ought to reference the right functions or else it won’t work.

SELECT LOWER([Title]) FROM [Movies$$] //Won’t work
SELECT LCASE([Title]) FROM [Movies$$] //Will work

Similarly, your Joins have to follow a slightly different syntax for it to work as expected.

In the excel file located at ExcelDatabaseOperations/Excel Files given above, you will find two sheets Movies and Budget.

Growing up, I used to love watching Tom and Jerry.
Scratch that, I STILL love watching it.

We are interested in retrieving the Lead Actor, Director from the Movies Sheet, and Gross Revenue from the Budget Sheet.

Let’s see how that’s done.

SELECT [Lead Actor], [Director], [Gross Revenue]

What comes next?

Where are we going to pull that data FROM?

SELECT [Lead Actor], [Director], [Gross Revenue]
FROM [Movies$$] AND [Budget$$]

If this is what you were thinking, then you are mistaken.

Not to worry, mistakes are natural when learning something new. This is what you should do instead.

SELECT t1.[Lead Actor], t1.[Director], t2.[Gross Revenue]
FROM [Movies$$] AS t1

The aliases are important since we will be referencing two tables, and the complier needs some sort of a reference to perform the desired operation…but where is our Budget?

That is achieved through JOIN, and more specifically an INNER JOIN.

SELECT t1.[Lead Actor], t1.[Director], t2.[Gross Revenue]
FROM [Movies$$]
INNER JOIN [Budget$$] AS t2

Joins are incomplete without a condition. They just are, don’t ask me why!

SELECT t1.[Lead Actor], t1.[Director], t2.[Gross Revenue]
FROM [Movies$$]
INNER JOIN [Budget$$] AS t2
ON t1.[Title] = t2.[Title]

And that is how you perform an INNER JOIN.

But do keep in mind, the INNER JOIN retrieves only those values which are present in both Tables, i.e., it retrieves an intersection of data which meet a given condition.

What if you wanted to keep values from one or both tables?

You’re Making Me Feel Left Out(er)

You can either specify the JOIN as LEFT/RIGHT OUTER JOIN.

LEFT/RIGHT captures all elements from either table, based on how you structure the query.

SELECT t1.[Lead Actor], t1.[Director], t2.[Gross Revenue]
FROM [Movies$$]
LEFT OUTER JOIN [Budget$$] AS t2 --Captures all records from 'Movies'
ON t1.[Title] = t2.[Title]
SELECT t1.[Lead Actor], t1.[Director], t2.[Gross Revenue]
FROM [Movies$$]
RIGHT OUTER JOIN [Budget$$] AS t2 --Captures all records from 'Budget'
ON t1.[Title] = t2.[Title]
SELECT t1.[Lead Actor], t1.[Director], t2.[Gross Revenue]
FROM [Movies$$]
RIGHT JOIN [Budget$$] AS t2 --It isn't necessary to specify the keyword 'OUTER'
ON t1.[Title] = t2.[Title]
WHERE INSTR(t1.[Lead Actor],'Tom') = 1 --You can chain additional conditions either using ON or WHERE
ORDER BY t1.[Title] --And order by whichever column you want

I tried to make some FULL OUTER JOINS, but couldn’t.

I spend way too much time experimenting with this and the errors I encountered drove me absolutely nuts. I thought it had something to do with the syntax, but that wasn’t it.

I wasted too much time on this.

Microsoft Access doesn’t support FULL OUTER JOINs, but that doesn’t mean you can’t achieve the result.

It gets a little tricky, and you can read more here.

What If I Have Multiple Excel Files?

The connection string only has room for a single file path, and I even tried modifying the CSV connection string to make it work with multiple excel files.

It didn’t.

I wasted too much time on this as well.

You have to merge them into a single spreadsheet using the Excel Advanced: Merge Action before performing JOINS.

Can We Create Sheets with Database Operations?(Spoiler Alert: Yes)

Yep, and I discovered this pearl while surfing across the Apeople forum.

You often come across some pretty interesting problems there, which is why I encourage you to visit it regularly and like all my comments.

I always offer detailed descriptions.

The syntax for this is pretty straightforward, and there is only one prerequisite.

Database operations work by establishing a connection with the data source before doing its magic. This means, you can create Sheets, but you can’t create the Spreadsheet itself.

The spreadsheet has to exist before we can add sheets into it.

To add sheets, you have to use a Database: Insert/Update/Delete Action.

CREATE TABLE [Let There Be Sheet] 
(FirstName VARCHAR, 
LastName VARCHAR, 
Department VARCHAR, 
Location VARCHAR); --Its better leaving the type as VARCHAR, i.e., make it generic

If you want to create a new spreadsheet and perform operations onto it, I won’t recommend using the File: Create Action to create the Excel Spreadsheet, instead use the Excel Advanced: Create Workbook.

The File: Create Action has a bug which will be addressed and resolved in next update.

Also, when you create a new Spreadsheet, it will always has a sheet named Sheet1, which is why I think it only makes sense to use this with pre-existing spreadsheets.

If you are still curious, you can simply try creating an Excel Spreadsheet using the File Action Package and try opening it. Its better if you get in the habit of playing around with different actions just to get a feel for it, as it helps you make better decisions during development and even prevent major production issues.

You wouldn’t want this to happen now would you?

Update Sheets with…JOINS?

Its better if you find other ways of achieving this because there is a good chance that this operation will timeout.

UPDATE [Movies$$] AS t1
INNER JOIN [Budget$$] AS t2
ON t1.[Title]=t2.[Title]
SET t1.[Title]= t2.[Profit] --The order is important

Of course, you can lengthen the SQL Timeout, but I would recommend using the Python Scripting Package instead, particularly the Pandas library.

Joining Multiple CSV Files

CSV are static files which means you can only query data from it and create joins.

Whether you have multiple sheets in a spreadsheet or multiple CSV files in a given Directory, you can chain as many JOINs as you want.

That will be illustrated here, as we will try to join three CSV files.

Now remember, the connection string for CSV files has to reference its Folder Path, and not the individual files. The table names are not the sheet names, but the file name along with its extension.

SELECT t2.[Lead Actor], t1.[Genre], t3.[Profit]
FROM [TGL.csv] AS t1
INNER JOIN [TAD.csv] AS t2
ON t1.[Title] = t2.[Title]

When we chain JOINs, the previous INNER JOIN has to be converted into a sub-query i.e., just sandwich the meat between two moon brackets before adding the last slice of INNER JOIN.

SELECT t2.[Lead Actor], t1.[Genre], t3.[Profit]
FROM ([TGL.csv] AS t1
INNER JOIN [TAD.csv] AS t2
ON t1.[Title] = t2.[Title])
INNER JOIN [DGB.csv] AS t3
ON t2.[Lead Actor] = t3.[Lead Actor]

If there are more files, then simply convert the entire block into a sub-query like so:

SELECT t2.[Lead Actor], t1.[Genre], t3.[Profit], tct.[Adsense Revenue]
FROM (([TGL.csv] AS t1
INNER JOIN [TAD.csv] AS t2
ON t1.[Title] = t2.[Title])
INNER JOIN [DGB.csv] AS t3
ON t2.[Lead Actor] = t3.[Lead Actor])
INNER JOIN [TCT.csv] As tct
ON t3.[Profit] = tct.[Profit]

And with that, we have covered most of the Database operations you will ever require when interacting with spreadsheets and CSV files.

I still haven’t found anything about CSV files, and whether we can actually manipulate data within it, but it doesn’t matter. We can always use a WHERE to filter data from it, write it to Excel, then update it.

All in all, there is nothing complicated about Database Operations. Exposure is the key, and the more familiar you become with it, the better equipped you are.

Now if you will excuse me, I have to get back to guilt tripping my juniors for not recognizing how generous I am for sharing all my work experience with them.

It isn’t very long before the abused becomes the abuser.

8 Comments

  1. A great article on Excel as DB.. Thanks for your efforts!

    Excel – Merge Action – Do you mean Append worksheet? I don’t see this action in .21 version.

    1. Hi Ishwarya,

      Appreciate the feedback, and yes I meant to say Append Worksheet.
      Let me know if there are any other topics you’d like me to cover.

      Kind Regards,
      Ashwin A.K

  2. Hi,

    I am trying to connect database but i am getting this error “UCAExc:::4.x.x user lacks privilege or object not found: TableName”
    Pls provide some suggestion. It’s urgent

    1. Hi Amerendra,

      Could you check if the table name is accurate?
      Also, ensure that you append two dollar signs to the end of the table name like so:
      SELECT * FROM [Sheet1$$]

      If that doesn’t solve the issue, then please create a post on apeople with details and screenshot and tag me.
      I can help you out there.

      Kind Regards,
      Ashwin A.K

  3. Hi Ashwin,

    Thanks so much for the article – just gotta say thanks for all tips you are putting out here and on the Apeople forum.

    I am having an issue with CSV joins where my query is returning the joined columns as separate columns instead of merging them together like when I perform regular database queries.

    For example:

    SELECT t1.Example
    FROM myTable.csv
    LEFT JOIN myOtherTable.csv ON Example t1 = Example t2

    Output (with the issue):

    t1.Example,12.Example
    val1,val1
    val2,val2

    As you can see in my example output above, the query is returning two identical columns instead of just the one merged column.

    Thanks for the article!

    1. I figured out my issue. I was using the “*” wildcard to select all columns instead of selecting only aliased columns.

  4. Hi Ashwin,

    Love the article and blog!

    Did you ever find a way to update CSV files with SQL as you did with the joining of multiple excel workbooks in the example above?

    I have some use cases working within A360 where I would much prefer to update a column with a value from a column in another CSV file/table than the other solutions I have come up with.

    Thanks,
    Ray

    1. Hi Ray,

      Appreciate the feedback – unfortunately CSV are static files and you can’t update them.
      You’ll have to either write the data to Excel and update it there or write it to another CSV file before deleting the original.

      Kind Regards,
      TCT

Leave a Comment

JOIN OUR NEWSLETTER
And get notified everytime we publish a new blog post.