Mike Jelen

August 30, 2022
2:00 pm

Snowflake Case Sensitivity – Does it Matter?

Snowflake provides immense flexibility of naming and querying objects; however, there are some ground rules around how to use upper and lower case characters with objects. While there are small nuances they are fundamental concepts within Snowflake.

Check out our Meetup video below to watch an overview of this event:

Other Meetups and Events to check out:

Transcript from the Meetup video:

Hello, everyone. Welcome to another Carolina Snowflake meetup group event. Today, we will be going into Snowflake case sensitivity. These are our Meetup group rules. We keep our previous meet ups on the AICG YouTube channel, which you can find in the discussion board, the link to it in our group.

Today, we’re going to have a brief introduction, Snowflake refresher, and then we’ll talk a little bit about case sensitivity and do a live demo. And then we’ll get into a little bit of information about our health checks and have an open discussion and bring up some of our upcoming events.

All right, so, Mike, I think it’s going to give our refresher on Snowflake. For those of you that are new to Snowflake, whether you’re evaluating Snowflake, your company recently subscribed to Snowflake.

Of course, everyone is always welcome, or if you’re just not quite sure you’re kicking the tires. And what is the Snowflake thing all about here? So at the end of the day, we say, Snowflake, think of the data cloud, that single repository to ingest your data, govern the data, you transform your data, you access it, and really, you’re sharing and scaling that data.

To many other types of users and groups within your organization could be outside your organization and really tailoring that end user experience, whether those are technical folks, non technical folks, and applying inappropriate user experience, that when I say appropriate user experience that has the right horsepower behind the scenes as to.

The amount of data, the type of data that you are analyzing. So everyone, regardless of where you are in the organization, you’re paying for the appropriate level of performance. And relative to Snowflake, Snowflake has a very flexible subscription based model where you’re only paying for what you use.

At the end of the day, it doesn’t matter where this data comes from. It could be a transactional on premise source database that’s running in a closet or even a SaaS-based application that your organization subscribes to.

Landing all of that data into Snowflake, transforming it, and then building the appropriate for some use cases, the appropriate models for a business intelligence and analytics tool. Other groups in your organization, such as data scientists, they’re going to want to look at data and it’s more raw or close to raw formats as they run various ML and AI against there.

And really, Snowflake is at the core of that for all of those different user groups and different types of data, whether that’s structured, unstructured, and applying that appropriate level of security and control.

Drew is going to COVID case sensitivity for us today. Hey, everybody. So when it comes to case sensitivity in Snowflake, a lot is going to depend on whether your Object Identifiers are double quoted or unquoted.

And when I talk about Object Identifiers, I’m referring to column names, schema names, table names, any sort of object within Snowflake. So in general, your unquoted object Identifiers are going to be resolved as uppercase characters, no matter how you specify them in your queries.

So if you can see the example on the slides right now, just select Case foo being your column name from table under lowercase table. Being your table name, that’s all. Snowflakes going to comprehend that as being uppercase foo and uppercase table.

So anything that you leave unquoted Snowflake is just going to assume that it’s uppercase. So if you were to another example, if you were to create a table where the table is all lower cases with no quotes around it, it’s going to resolve it to be to create a table with entirely uppercase characters.

However, if you decide that you want your objects to be case sensitive, so if you want kind of mixed case, you would want to quote your object identifiers. So in that previous example, we had create table where table is all lower case but no quotes around it, it’s going to resolve to create table with all uppercase letters.

But if you were to double quote your table name as such in the example, you’re going to end up with a table, a table name that is actually case sensitive. And the big kicker there is that in Snowflake, the only way to really access case sensitive objects is using your double quotes.

So if I were to create that table, that table with mixed case in quotes, I can only access it if I’m putting my double quotes around it in the future when I’m querying it. So some other big requirements for case sensitivity with regards to your quotes and unquoted objects.

If you are, if you’ve got unquoted object identifiers, you can only use. Letters, numbers, underscores and dollar signs. And your objects have to begin with letters or underscores. So you can’t create table names with your special characters or dollar signs or even numbers beginning with them if you’re going to leave them unquoted in your in your DDLs or in your table declarations.

When it comes to quoted Object Identifiers, you can include anything that unquoted object Identifiers can have. So letters, numbers, underscores, dollar signs, plus you can have special characters, white space or extended ASCII or nonASCII characters.

Now there is a way to kind of, I guess override case sensitivity within Snowflake and that’s using the Snowflake parameter. Quoted identifiers. Ignore case. And the idea being if you are to use that parameter and if you were to set that parameter to be true, whereas the default is false, ultimately what you’re doing is you’re saying that anything inside of double quotes is automatically going to be resolved to be uppercase.

So all of your Object Identifiers, everything is going to be uppercase and it’s almost going to make everything within Snowflake case insensitive. The big recommendation though is if you’re going to use this, it’s really recommended to do it early within your Snowflake implementation, kind of get into why throwing that parameter around in the middle of your implementation or late into your implementation might cause problems and the issues that doing that can provides.

So ultimately this can be set at the session or the object level. And the way that you want to be able to set this parameter, modify this parameter is just using your alter account or Alter session statements and then set the parameter equal to true or false.

And I mentioned it defaults to false. So by default cases Snowflake is going to be case sensitive. Now, one other thing just to kind of keep in mind with Snowflake and case sensitivity is that string comparisons are always going to be case sensitive.

So whether if you are trying to look up a value within a column, you’re going to need to take that case into account so you can use the like, operator, and that’s going to be case sensitive, where it’s if you are wanting to find the you know, in the example on the slide, if it’s like test, all of the characters within the column need to match test one to one.

So you need to have that uppercase t and then those lowercase letters to follow. Whereas if you were to use I like, that’s going to be in case insensitive. That’s going to throw case sensitivity out the window and it’s just going to look for any matches to test.

So I think yeah, this diagram just shows a little bit more about what Snowflake is going to change your change your object Identifiers with respect to quotes being around them and with respect to the quoted Identifiers parameter being set to true and false.

So in that first example, all lowercase column name is going to resolve to all lowercase column name when. When that parameter is set to be false. But when it’s set to be true, it’s going to resolve to be all uppercase.

So again, trying to throw case sensitivity out the window as much as possible. And I think right now I’m going to go ahead and share my screen and we can kind of walk through some examples that I put together that kind of demonstrate the kind of the ins and outs of case sensitivity as I’ve gone through them so far.

So we’re just going to start off by creating a table. And all I’m doing here when I’m creating the table is I’m creating the table as the top 100, top 100 records actually in one of DataLakeHouse.io, pre built DoorDash for Work models.

So if you were to connect your DoorDash for Work with DataLakeHouse.io, you would be privy to a lot of these prebuilt models that prevail analytical models that come along with it. So all I’m going to do right here is just create this table as the top 100 records of this mock data that we’ve got set up.

And if you’ll notice that all I’ve done is in this query, I’ve left everything unquoted. So by default, Snowflake should capitalize every single column in there. So even if I run this query right here, which kind of has a mixture of quoted and unquoted object identifiers or column identifiers, this column or this query should run just fine.

Because as I mentioned earlier, our unquoted identifiers right here are always going to present to be uppercase or Snowflakes going to process them as being uppercase, which it does. Since I’ve got the Company Paid column in quotes, I need to match the case that’s within the actual table declaration itself.

And since it’s resolved to be uppercase, I’ve got it within all uppercase. And we don’t have any issues with the query. So we can now see these five different columns all spread out there. Now, if I were to adjust the case within the column name for Company paid, and if I threw in a bunch of lower case letters but all within quotes, my query should fail.

More technical difficulties? Let me make sure I was messing around with the quoted Identifiers parameter, like I mentioned earlier. So now that’s set to false, this is the default. So if I run that again, now I get my error.

So Snowflake is expecting if it sees quotes right here, it’s expecting that my case needs to match that of what is in the table or what’s in the table declaration. So since it’s not all uppercase inside of the quotes, it’s looking for a different column.

And the column M Company Paid, with Company Paid being all lower case technically doesn’t exist. So that’s why our query fails. So I’m now going to try creating a table with mixed case column Identifiers.

And this actually is coming from one of the Nomad data sets, which is a data set. Noma has a ton of different data sets that are available in the data marketplace. Great place to do a lot of financial data analysis.

If you are in the market for data. But basically what I’m doing right here is I’m trying to create another table. In my test schema, I’m actually creating the table name with mixed case with mixed case characters and numbers in there.

And if you remember that if you were trying to create a table name that was unquoted with numbers at the beginning, you are going to run into an error because you would only begin a table name with underscores or characters if you were going to leave everything unquoted.

So that’s one quirk of the table that we’re going to be working with. Another is that the finance data atlas table that we’re going to be looking at right here has mixed case character column names. And by default, since I’m trying to query from this data set right here, I need to have my column names within quotes and they need to match the case of the table that I’m trying to pull from right here.

So this is the day that we’re trying to create a table from and let me just actually that’s not too much data, so I’m not going to worry about a top 100 or anything like that. So I’m just going to run my creator replace table and that works just fine.

And you can see right here that when I just try to query indicator from my new table, I don’t have any issues because I am using quotes to surround this mixed case column name and I’m also also using quotes to surround the table name, which also has kind of that mixed case piece as well.

When I try to query indicator from that same table without quotes though, you’ll see that Snowflake is actually trying to look for all uppercase indicator because I don’t have quotes around it. And that just goes back to kind of that very first point I talked about with Snowflakes going to resolve anything that’s not in quotes to the upper case.

So going on to this next query, we actually have the same issue as well. Even though our column name is in quotes, that all looks good. We don’t have quotes around our table name right here which needs quotes because not only is it mixed case but it is beginning with those numerals like I referred to earlier.

So if you wanted that to be able to run, it’s going to need to be formatted in this manner right here that I got highlighted on line 51. Now kind of going into the corded identifiers ignore case parameter which I accidentally had said to be true at the beginning of this session.

We’re going to try and set that to true right here and see what kind of effect that has. And at the beginning of kind of my discussion about the parameter I mentioned that you’re not going to want to, especially if you’re working in an organization that has kind of established use of Snowflake and has a lot of pipelines working within Snowflake.

You’re not going to want to set this to be true if you’ve been using it as false or vice versa. And for example, this query right here with. Which you know, I just showed to you, it’s the same query from line 51.

The indicator column is in quotes like it should be and the table name is in quotes like it should be. In order to be able to query mix case objects is actually not going to work. And as you’ll see, what Snowflake is really doing right here with this table name is it and it goes back to this parameter being set to true.

Is it’s trying to make nomad data be capitalized? Right here, as you can see in the compilation error, Nomadata is all capitalized. It’s looking for an all capital table name because we’ve got our parameters set to be true.

It’s basically trying to make everything case insensitive it’s trying to make everything capital letters so that there’s no mixed case whatsoever. Now, this query right here on line 61 actually will work.

And it was one that I previously showed you that hadn’t worked based on the fact that whatever was previously in quotes was all lower case with the exception of one letter, but needed to match the case of the column name that was in the table.

So this one here in the DoorDash orders table company Paid is declared as an all uppercase column. And what Snowflake is doing when it sees the quotes here now, based on our parameter being set to true, is it’s taking.

And company paid and it’s making it all upper case. So that’s why this query fails or this query succeeds right here, but our indicator query fails. So again, just kind of going back to that point about not messing with the implementation of the parameters, the implementation of the quoted Identifiers parameter within the middle of your Snowflake implementation, it’s not a good idea.

Because if you’ve got, for example, in our case, if you’ve got if you previously created 2019 Nomadata with our mixed case right here, that table, if you tried to previously create that table and use it with the quoted Identifiers parameters set to be false, it’s not going to be accessible anymore.

If your parameter then gets set to be true. So there’s literally no way that I can access this with the parameter set to be true because anything that is it’s going to resolve anything that is in quotes to be uppercase.

And in the same way, it does the same thing with unquoted data as well. So, you know, the same query or the same error that we ran into previously when we tried to run, we tried to run the previous query without quotes being quote surrounding our table name.

You’re just going to have a lot of trouble with those those mixed cases per se. If you’re messing with the quoted Identifiers parameter. Yeah. So one of the things I wanted to touch on was the string comparison piece.

I brushed over it briefly if I want to just kind of I’m actually going to set my parameter back to be false just so I don’t mess anything else up. If I just select star from our DoorDash orders table that we created earlier.

And I take a look at the pickup delivery column and I say that I want to find any delivery orders. And as such, if I run this query right here, I don’t get any results. Now, if you notice in our column, delivery was all lower case the way that it was being specified in the data.

So instead, if I want to do a case insensitive comparison, string comparison like was mentioned on the slides, I would do I like or use the I like operator and that would give me the results that I was looking for.

If I wanted to find delivery all lower case, I would need to match and I wanted to use the like operator, I would need to match my case one to one. So it’s all lower case within the data. I need to make sure that it’s all lowercase in my presentation, in my query.

So kind of like that. And yeah, that’s kind of the Gist of case sensitivity with Snowflakes in these examples on GitHub. And we hope that you’ll give us a star. And just let us know if you need any assistance.

You can reach out to us either in the comments of the video or you can reach out to us through the meet up group discussion board. And in case you don’t know, we are AICG. We are the developers of the Data lake house platform, and our Data Lake House IO comes with pre built source connectors.

We have pre built analytical models and pre built dashboards for some really popular sources, and we can actually tailor analytics to go across sources. So if you wanted to measure multiple different sources in a single dashboard, we can do that for you.

Right now we’re running free Snowflake health checks for anyone interested, and I will put that site in the chat for you guys in case anyone’s interested in going and booking a health check and learning more information about that.

You can see it in the site when you go to the link here. All right? So if anybody has any questions, go ahead and post them in the chat or the Q and A. I have a question for you, Drew, which we kind of talked about a little bit the other day.

But if I was new to my Snowflake implementation and I was trying to decide, hey, do I want to use this parameter and override, or do I not? What would you recommend? I’d say it just depends on your use case.

If you know that you’re going to want all of your column names and all of your table names, all of your schema names to be uppercase, and you’re not going to have to worry about lower case at all. I think you might want to go ahead and set that parameter, the quoted Identifiers parameter, to be true.

Personally, though, I think there’s a lot of use case to have that flexibility to go between case sensitivity and case insensitivity with using quotes. The Nomadata was a huge example. A lot of the data that you’re going to find in the data marketplace is going to have a lot of mixed case column names, mixed case table names that you’re really going to want to be able to reference within quotes.

And if you’ve got that I showed that in the presentation that if you’ve got the parameters set to be true, then you literally have no way to access those tables. So maybe there’s a use case to quickly turn the parameter on and off again or vice versa.

But if you want to have that flexibility with case sensitivity, I think it’s recommended just to keep it as the default of false, and I think it’s a default for a reason. Yeah, I think Snowflake probably knew what they were doing a little bit Tanya in the Q and A as when migrating data warehouses from SQL Server to Snowflake.

What’s the recommended value for this parameter? That’s a great question. I think you’d probably keep it as false as well, although I think the one thing you’d want to be asking yourself is, what do I want my column names to resolve to?

So if you’ve got mixed case within SQL Server and you’re going to be migrating your data with that mixed case in mind, you’re probably going to need to keep it. False and then be able to access those column names tables with those quotes.

However, if you want to migrate those mixed case tables columns to Snowflake and have them all be uppercase, then yeah, sure, it might be just fine to have that be set to set true, but again, it’s going to depend on your use case and what you’re hoping to be able to leverage within Snowflake in the future.

Thanks to all right, so we have some more great events coming up. They are not posted yet because we’re working on collaborating with a few other people on these to kind of mix it up and make things extra interesting.

So we’ll have some guest presenters coming on and pretty exciting stuff, so look out for that in your inbox and be sure to sign up for those as you see them. And if anybody’s going to dbt’s Coalesce, we will be there and we’d love to meet up for coffee.

Alright, thanks everyone, and we’ll get this uploaded onto YouTube soon as possible.

Guys, have a great night. Thanks, everybody.

More to explorer

Snowflake Loading Data with Special Characters

July 24, 2024 No Comments

Special characters in your column names can cause chaos for downstream users, tools and processes.

Building a Generative AI Competency (or the First Gen AI Project)

July 21, 2024 No Comments

When Building a Generative AI Competency one must identify the necessary infrastructure, architecture, platform, and other resources and partners that can help an AI initiative be successful. We have just like many data warehouse and digital transformation initiatives over the last 20 years fail because of poor leadership, or companies only going half in on the objective.

clock, time management, time-3222267.jpg

Snowflake Time Travel Not the First Time Traveller but Let’s Review

June 12, 2024 No Comments

IBM DB2 long before Snowflake had this concept as did a few other select databases using a Temporal Database concept. I think Snowflake was able to make it more popular and mainstream due to the association of Data Warehousing and analytics specifically