Mike Jelen

December 21, 2021
10:00 am

What are Java & Python UDFs ?

Released as a new feature in Snowflake around June, 2021, the idea that you can bring your own Java classes into Snowflake to run functions with rapid speed based on the Java language is amazing UDF and UDTF are powerful but User-Defined External Functions are even more powerful to call out to GCP, AWS, et.

This opens up almost limitless possibilities when interacting with your data and leveraging existing code. Think translations, lookups, out of the box computations from well-known Java libraries, oh my.

You can read more about external functions in Snowflake here.

Check out our Meetup video below to watch an overview of this event:

Other Meetups and Events to check out:

Transcript from the Meetup event:
Alright guys, so as you can tell in these Meetups, we like to have fun and we like to hang out and talk things, all things Snowflake. And so this one tonight we put together, we actually had quite a few people from the last several Meetup’s talk about UDFs and some of the new features that kind of came out for them.

And so we’re just going to kind of, like, really touch the service and we could dive into this probably a little bit deeper. But if we have time at the end, maybe we can. And then, of course, we can always set up another one to really get into UDFs and how Job and Python really can be helpful to integrate and act as good clients for Snowflakes.

So if you guys have been here before, we just kind of go over some super quick fun rules. Everybody’s respectful, everyone’s trying to have some fun. If you have a question or want to chat, typically post it to everyone so everyone can see it.

You don’t have to turn your video on. We recommend next time you come, you invite a friend or recommend us and then of course to us on Twitter and on social media. We’re over at DataLakeHouse, but for tonight we’re just going to go over a few things.

And so we’re just going to hit a lot of the high level topics or high level coverage on UDF user defined functions, UDFs talk about some of the use cases and how some customers are actually using it and why we like them.

And then originally we talked about this to be focused specifically on Java, but obviously not a resident expert. So talk about regular UDLs, but a decent amount is going to focus on Java and we’ll walk through some stuff.

I think we’re going to hit the Jupyter notebook tonight, but we’ll see what happens and then we’ll do some walk through some real step here of code and stuff like that. So if we don’t get to the python visualizations, don’t hate on us too much, we’ll definitely bring it back in a follow up, follow up meetup session.

All of our Meetups, they get recorded, they get put out there for everybody to see. Again, hopefully you guys enjoy it, but definitely subscribe to our YouTube channel so we can keep getting feedback on these events and how we’re doing and so forth.

So well, just real quick, again, I think everyone here is probably a little bit familiar with Snowflake, but your mileage may vary on your experience. So what is it again? Super high level? It is the data cloud, right?

Data warehousing in the cloud. So it’s a perfect tool for ingesting data continuous integration. Definitely some key ways to govern the data that’s flowing into the system. If you guys haven’t checked out tagging, that’s really cool.

I like to throw that out these days just for people who aren’t aware. And then of course, it’s a great way to access data across all of your customers, all your internal systems for any apps you’re building.

And then it’s got near infinite scale. So pretty much all of your workloads that you have, you’ve been doing with data integration and so forth can be put on a Snowflake. So it’s really awesome platform.

And at the end of the day, it’s really out there solving the problem of getting data out to consumers in many different ways, but then also in doing what it can to ingest the data, transform the data normalize.

The data provide all sorts of analytical functions and capabilities, and getting that into one single platform so that your data consumers can do their most effective jobs with that data, turn it into information and such.

So, super fast speed around the what it is and what problems are solved. But we’re here to talk about UDFs inside of Snowflake, so let’s get to that. And Mike, am I turning this one over to you? I can’t remember, but we’ll start with the first bullet point anyway.

Yeah, absolutely. I can have an idea jump in where we’re needed. But just in general, I just want to set the expectation of, well, what is UDF? Why is it important and why is it interesting? Or why is it even a big deal in the context of running these from within Snowflake?

And you think of. UDF and we also talk about UDTF as well. It’s this thing, a user defined function. Think of these as reusable code. It’s more than just, oh, I want to do some processing within Snowflake, I have some SQL.

It’s far and above that from the context of be able to do a lot more than just SQL, for example. But you’re really building out and then user defined functions as user defined table functions similar to the UDF, but really you’re looking at returning a set of values or think of more of a temporary table and then you’re also calling external functions as well.

So calling out to whether it’s GCP, Azure, AWS, and really, you already have code that’s written, or somebody in your organization may have already written code, and instead of going through a very long, tedious process that may not even be possible to try to recompile that and get it to run within Snowflake, you’re just calling out to these external functions.

So really unleashing the power of what you’ve already created within other applications, but all running all of that from within the context of Snowflakes. And I really like the user defined external function capability because you can think of it like exactly what you’re saying, mike right.

You could call it to an existing AWS lambda function. We’ve done some cool stuff there where we’re doing like natural language processing by just passing it to lambda function just written in Python, whose lambda function is and then calling some sort of NLP library doing some cool stuff like that.

So that’s a small use case, but it’s a pretty interesting one of just like what you can do inside of through Snowflake itself. But you’re doing it inside a Snowflake, right? Absolutely. There’s definitely some other use cases.

We have over to the next slide here when we talk about. As you mentioned, some of the natural language processing is also training your models. Think of from a data science perspective here the ML components where perhaps you’re scoring data or you’re looking for anomalies within the data itself.

Here, all of that is done upon load within Snowflake, and you’re really leveraging all the power of things that have since been created outside of Snowflake. Of course, you may have written some things within Java and Spark like, oh, wow, okay.

If you’ve come from the Hadoop world, you will have a lot of this sitting within your organization. And as much as Hadoop has gone down a sunset route, there’s still a lot of valuable code that’s been written, and you’re able to continue to leverage that within calling these Java UDFs.

And just like Kristen mentioned, the calling out to lambda so we can jump to the next slide. And then there’s even more advanced things. Think of all right, you’re doing geospatial. You’re trying to put shapes onto a map, calculate distance between two points.

All right, it’s interesting, obviously this even think from a casino floor perspective. You’re trying to figure out, all right, what’s the layout of that casino floor, where’s really do I want a certain dollar performing games and machines.

And to calculate all that using data on the surface sounds pretty crazy, but it’s a very legitimate business case. It’s much like a lot of organizations get no shortage of Excel and PDF files, and trying to parse out a lot of that data can be very laborious.

And a lot of code that’s been written, especially if you’re a larger enterprise organization, you already have a lot of these parsers. You can just leverage that parse within Snowflake via the UDFs. Then I’ll cover this one a little bit, storing job libraries and stages save to take from those libraries and plug it on in.

Talk about that with me. Okay, let’s see here. So we’ve got a couple of items we wanted to COVID So we’re gonna create a basic UDF in SQL. I think that’s actually true, but I can, I can run that one too on the screen up.

And then we’re going to create a Java UDF. And I think we’re going to get into notebook a little bit, but we might just spend enough time on the UDFs themselves. And I might not get into the jupyter aspect of it, but we’re just kind of basically leveraging some EDFs to plot out some stuff.

So let’s see which one is next year and then we’ll just walk over into that because I think mine is after this one. You can go into the next slide. Okay. So when you think of a sequel UDF, this might be the easiest type of UDF to start with.

From an example perspective, where really you’re on the fly processing, potentially processing data. You’re trying to potentially even look at a table. So think of a security table. While you’re not going to necessarily let folks within the organization be able to arbitrarily query a security table, however, there might be some attributes of that table, number of records that may be of interest to an audience.

You want to expose some of that functionality from a read only perspective, they’re able to set up SQL UDFs to do that. You also can on the fly to aggregation of certain data elements and their values and put that into a table so that that UDTF perspective.

So go to the next slide. Yeah. So, kind of like Mike mentioned, one of the biggest use cases, I think out there for UDFs and Udtfs is security and the ability to maybe you have a read only table that you want to have some users able to access, but you don’t want them necessarily to have everything.

So you do have the ability to maybe allow them to send in an input to a UDF and it returns a result or something like that. So security is one of these nice use cases for UDFs Udtfs. But also just I think one of the biggest things with UDFs Udtfs is the ability to reuse and recycle code.

So if you are writing a nice long store procedure that you know you’re going to have to write a certain piece of code over and over and over again, it’s really nice because you are able to use a UDF UDTF to consolidate all that code and simply be able to send in an input.

Kind of like you would like a Python function or a C or C function. And the nice thing also with UDFs Udtfs is you can be able to use them within SQL statements. So if I’m running a select statement, you can say select and then the function name and it will return your result like that.

And we’ll be able to see it in a demo in just a moment. But the other nice thing about UDFs and Udtfs are the ability to potentially improve execution times depending on what sequel platform you’re the ability to have caching the kind of the platform remembering.

What the code is and what the results of the code would be. That improves execution times and allows you to be able to run queries a little bit faster. So, Christian, if you let me share my screen, we can go into this demo.

Alright. Who’s hosting? Is that heather? Well, I actually started it. Let me see. Here you go. You should be able to do it now. All right, thanks. Yeah. All right. Can everybody see my screen? Yes, we see it.

Okay, great. So I wanted to start with one use case, and I’m going to run a quick query. We’re going to be using this transaction history table, which is just kind of an arbitrary listing of transactions.

And the main column we’re going to look at is this actual cost field. So one record in the table is consistent with one transaction. And like, for row one, we’re looking at a cost of one $700.99. Well, let’s say that we want to be able to kind of bucketize these transactions.

We want to be able to say that 1700 falls within the realm of 1517 50. And we use that kind of 1500 to 1750 buckets as kind of maybe a KPI for a bi dashboard or something like that. This is this might be a nice way.

To allow a UDF to do some of the work for us. So we’ve got a little bit of code right here. I’m just going to highlight lines seven through 28. And basically what we’ve got here is a UDF. So if it is going to take in a field or take in a column called we’re going to call it Actual Cost within the UDF.

And what we’re trying to get back is a bar chart or basically this kind of bucket. So we’re trying to categorize all of the costs within the query. We’re going to try and bucketize them and give them a category.

So you can see right here, we’re just saying that when this input is between two values, then we want to put it in a certain bucket so we can run this and our function is created. And we’ll actually use this function now to be able to input this actual cost field that we saw in the transaction history table.

And we’ll try to bucketize our transactions based on that. So we run that and yes, so what we see is we’ve got kind of a bucketized transaction amount based on the input of Actual costs that we had from the transaction history table.

So that’s kind of one of the use cases. And this is just a UDF. So it means when we are sending in. Sending in one parameter. We are expecting one parameter to come out. So in this query right here, we call the cost bucket function the cost bucket UDF.

We send in one value and we get back one value value. So the results of this query are going to be the exact same and I filtered the top 100 results on it. But if we were to if we were to run it on the entire table, we would get the entire table back.

So we’re not going to get any filtering or aggregation or grouping done because we call this UDF. But I guess that leads us into a UDTF. So like Mike was saying earlier, a UDTF is user defined tabular table function.

So what we get back, and the difference between that and UDF is that we return a table instead of just one scalar value. Right here we are creating a function in the same syntax that we did on the last function right here.

Create a replace function. We’re doing it the exact same way. Here. We are to call this one transaction total by product. And we are going to return a table this time instead of just a bar chart. And this is kind of where you do your table definition.

Right here. We see we only want to return a two columns, a product ID and a transaction count. Notice we aren’t sending in any parameters on this one. We are just simply selecting from this. Table from this table right here, this transaction history table that we were looking at on the previous example.

So we’ll run this and we’ve created a function. So and then the way that you query the results of the UDTF then are you use Snowflakes built in table function and you basically just select from and then the results of the function call transaction Totals by product right here.

And so we do that and you see we’ve got everything boiled down to the product ID level. So we do in this case do grouping and aggregation based on product ID. And we are trying to get the number of transactions based on that specific product ID.

So that’s kind of the gist of the UDTF. And then we’ve got another example right here where we do a little bit more joining and aggregation, but it’s in essence the same thing as what we had seen on the last example.

So we’re joining two tables, purchase Order Header and Purchase Order Detail, and we are grouping by vendor ID this time. And we are trying to get accounts of the purchases made by a specific vendor or each specific vendor.

So this is going to return a table. Again, we’re looking at just vendor ID and Total Order calling called Total Order by vendor. But we’re looking for the number of number of orders, number of purchases by.

Each vendor. So we do that. We created our function, and then in the same way, we queried the table using the last example of a UDTF. We can go here and we see that we have all of our vendor IDs and then the number orders made by each vendor.

And then one more thing I just wanted to point out was you can query the information schema query, which is very handy at some point in time, but in here, we’re able to see a few different functions that we created.

And you can see specifically the functions that we made right now. So we made cost bucket transaction totals by product. And we made one of these is the one we just made purchase totals by vendor. So, yeah, that’s about it.

From what I have on SQL UDFs, I think we’re going to go over to Java EDFs now. Pretty cool stuff. And that one use case you had there with the bands, that’s such a popular use case. Like in HR, like, we see it all the time.

Like, you know what pay grade band is an employee and that type of thing. So that’s super helpful for sure. That’s something that gets repeated a lot in HR analytics. Can you guys still see my desktop?

What am I showing? What is the java UDF slide? Perfect. All right. One thing I thought about while you’re doing that, Drew, we really never in this meet up step back to kind of like, what is a function in general, right?

It might cover it, but is more of the baseline that there are system functions in. All these databases and applications and definitely in Snowflake in general, right. So if you’ve ever called Coalesce or Trim or any other standard like a database or Snowflake function, those are system functions.

So user defined function as a user creating their own function that can be used like a system function. So I don’t know if we ever mentioned that, but it’s kind of like the elementary, often abstracted reference to it.

Anyway, hopefully that makes sense. But yeah. Great stuff, Jerry. Really good coverage, especially with the information schema. So a job at UDF just to kind of throw it out there. Very similar to one that you would do with plain SQL.

We like it because you get to use your Java skills if you have any people who either need to dust off those skills or use it or like they use and show off their Java skills. And it’s another way to compute or process the information that’s coming out of your SQL query.

Right. And there’s a couple of things that I wanted to call out specifically on the Java UDF, having written Java for more than a decade now. But what’s really interesting about the Java UDF and we’ll talk about it and we’ll see it just like we saw the sequel UDF that Drew was walking through.

One interesting fact is that there’s a memory backed Kemp drive where you could in essence use that to read and write files. Right. Kind of in the background. I always thought that was very interesting.

First, start learning about these now. It’s one of those things probably your mileage may vary. It’s probably not exactly like you’re running like a docker image behind the scenes. You can do whatever you want.

There’s probably limitations. You have to test that out. But for basic stuff, it actually works. There’s a key term in programming called identipotency. Some people say say differently. I say identify.

Or Idipotent. If you’ve never heard of this word, go look it up. It’s very effective word for a lot of scenarios that happen in development and I think sequels and some of the similar use case for Idipotency.

So take a look at that. But when you’re using Java UDFs, it’s something to think about. What we’ve seen in practice, making sure that everything is just kind of like a singular or single reference to anything that you’re creating there.

We won’t go through it in great detail, but there are some situations where if you’re running multiple Java UDFs at the same time, or kind of like running things in parallel, especially if you’re writing files and reading files, you have to make sure file names are very unique, for example.

So that’s one of the core tenets of identity. Also, it’s really working in a restricted kind of engine. I think the documentation refers to the UDFs that you’re basically compiling your Java in or working in a restricted engine.

The only thing I could think of, if anybody’s a really technical, like a Linux operating system background, is kind of like it’s a church jail type of situation. But I don’t actually have an answer on that.

I have to talk to one of the Snowflake engineers at some point on that. But it’s just an interesting thought for Java developers out there to think about. And there’s some other restrictions that are out there.

Like JNI is not supported and in theory you’re not going to use it for making any network calls. So that’s where maybe like an external UDF might come into play. And there are a few more things. I think the documentation is okay on UDF.

So I think it really depends on what you actually are trying to do with the Java UDF. So I’m going to show like a super basic example and I’m going to try to show kind of more like a syntactical example of the Java UDF and try to take a look at things like syntax compiling.

Data type considerations and just kind of like what we’re actually returning, which some people might find interesting, but we’ll at least take a quick look at it and then we’ll run an example. So very similar to what Drew was doing.

I’m going to go ahead and kind of blow this up a little bit. It hopefully helps some folks. So I think I’m using this current environment here is context and so what I’m going to do is take a look at a similar data set to what Drew is looking at.

And so this is basically like purchase order detail if anybody’s interesting, I guess. Another thing to bring up, we put all of our code out on the GitHub repository. So anything that we’re doing here will be out there in a couple of days as well as a link to the recorded video just in case anybody’s interested in that.

So here we have just a basic result set, something like you would see in a third normal form sales database for a company. And so we’re looking to purchase more detail. We’ve got basic columns here, won’t go into great detail.

And so what I want to do is just create a Java UDF that’s going to do something with that data. Again, just like Drew showed for the SQL UDFs, we’re basically going to integrate it within a SQL query.

We’re going to integrate the userdefined function call inside of a sequel statement to do some sort of repetitive operation or some process. And in this case I just picked a very straightforward one.

It might not be straightforward or entertaining, but it’s, I think it covers a point of a very basic Java UDF. So I’m actually fully qualifying here. So we’ve got create replace function going down. So we’ve got database schema and then function name.

And you might know or notice that just like some other things like schemas, I think some other objects within Snowflake, a lot of them are schema based, so they’re not the database level. They’re at the schema level.

So our function here is listed at the schema level. And of course if you’re interested, you can always try to create the function at the database level and see what happens. So we’ve got the one here called Git product cost.

The logic I’m actually doing is different from cost. I’m trying to aim to do taxation. So I’m accepting three parameters here. So we’ve got X, Y and Z. So we went ahead with different data types and I’ll go over why we did that here in a moment.

So we’ve got two integers and a float. And so something to notice here is float is going to be our Snowflake data type for any sort of double precision. So if you’re looking at any sort of like dollar amount, something like that, where you’ve got a scale and a precision.

So you’ve got a dot, something. So $5.36, you’re typically going to use something like a float for that. And so we’re actually going to return a float because again, we’re returning Snowflake data types here.

And so then we’re also going to put language. And so our language is going to be Java and then we’re going to have a handler. So the handler’s kind of interesting because it’s really just referencing our class in our method or our Java function.

And then inside of our action brackets here, we’re going to go ahead and create, again, very basic. I think anyone who’s taking kind of one on one in computer science can follow the syntax here. So we’re just creating a basic class, giving the class a name, using our open bracket structure.

We’re creating a public static and then we’re going to return AAA type. So if you look at your Java documentation, a double is going to be that kind of floating integer concept that’s similar to a float in Snowflake data types.

And so we’re going to give our function a name. This is just to do something function. And then here we’re going to take those three parameters and. And we’re gonna pass these Snowflake parameters into our Java parameters, and we’re just gonna do some basic math.

So the goal here was basically to take two arguments, add them together, and then multiply them by some factor that’s coming in here as a float. So this could be anything from, like, any type of, like, draw driver percentage.

I want to see what happens if sales increase by 7% over the next year or something like that. Or if you just want to say, hey, I want to see what the tax will be on this particular sales transaction that happened.

So there’s many different things you could do here. But that’s the use case. We’re going for the latter one with the taxation. So we simply run our creator replace function here. Everything should compile correctly.

And if you think about that, so this is one of my bolts. This is actually compiling and back in. If it’s not compiling, I really want to know how Snowflake work that magic and parse that just for giggles.

I’m going to run it without the schema, just in case anybody ever test things like this. Right? So we’ve got schema does not exist. So it kind of gives you an indication of where your functions actually live.

Okay, so we’ve created that function. Let’s create it again just in case we’ll replace it. And so now, just like what Drew is doing, we are going to take that data set I ran before we create the function, and I’m going to go ahead and just kind of pull out the ones I want to take a look at, and then I’m going to run this function.

Now, I’m fully qualifying it here, right, drew, you didn’t fully qualify yours, right? You just have the function name. No, I just use the inside of the table function. Yeah, because you were already in that context in your query.

But it works both ways, I think is what we’re highlighting. And then all we’re doing here is simply passing in, I guess, investment or quantity, unit price. And then I am passing in the. This is really percentage, right?

Like just pure math, right? You’re taking a number and then you’ve got the percentage. In essence, that should be the addition of the percentage to whatever amount you’re looking to get the increased percentage by.

And then we call that as a field, as unit price tax. So it’s a very rudimentary example that running through. So you saw the result before. And so now if I run this and I’m also looking at time as well, because here, I did not limit my rows, so I knew there was about 100 rows here and I’m using an extra small warehouse.

So all these things are factors in what we’re doing here. And so when you think about using this for performance reliability or just processing impact, you can think about that. It’s the first time I ran it came back in two, 2 seconds for about 100 rows.

And then if I run it again a little bit faster, right? So it’s still pretty fast. Even though it’s a very basic mathematical function, there’s nothing crazy there. So if I were to do that same math in the standard SQL function, it shouldn’t return that much faster, right?

So when you think about performance, if anybody brings it up to you, it’s going to be negligible, probably immaterial, especially over like, average data set size. We’ve done user defined functions over very large data sets and there’s very little impact to processing speed of time.

So that is the usage of basic Java UDF. I mean, there’s so much more you can do there. We didn’t even get into calling like actual normally used Java libraries and things of that nature. But I’m going to step into this just kind of reference.

We don’t actually have a fullblown example here. It’s not one that I plan on running right now just for the sake of time, but also with the Java UDS, you can actually import a Jar file. So this one’s actually interesting.

We. We do plan on doing like another meetup just to kind of really go deep dive into this. But we don’t want to steal the thunder from some of the other UDFs that are out there. And so I just kind of talk about the way it works real quick.

So basically what you’re able to do is you can create a stage, just like a normal stage, like you would have for importing CSV or JSON files or anything like that. And you would bring in a Jar file. So you could create your Jar file in Spring Boot.

You can create it in Corkus. That’s our favorite job framework. If you guys on the Porcus check it out if your job is ashamed of and then or just plan on coaches if you want to. And you would actually compile your Jar.

You would basically copy into or bring your Jar into a stage. And then you can use that Jar by referencing the stage like you would any other stage for like CSVs and JSON files. Like I mentioned, you would simply use an import function, locate your Jar, and then locate the class path for which that Jar is created under, or sorry, the method in the class which is created.

And then you would just very much use your call to the function, very similar to what we’re doing here, to actually run that syntax. This is just an example to create a function called Show me the Money.

This could be a basic algorithm, it could be an NLP, it could be anything that Java is really running in that compiled library. And then you basically pass it in and it’ll pass that string, whatever you have into this method that you’re calling here.

And then it will return your information. Sorry we don’t have a great example for you running on that one, but another great use case for job UDS. Okay, so back to the slide where real quick here. So that’s kind of what the syntax looks like that shown a minute ago.

And See. Okay, Drew or Mike, what are we doing on Pipe? Do we have find a python to look at? Anything in the notebooks. I’ve got the notebook, and I can run through it real quick if you guys want. Sure.

Yeah. We’ve got a few minutes left before we drop. Anybody in the group have any questions while Drew is pulling that up? Anybody used UDFs, like big time or any fun use cases, we’ll talk about we love hearing stories from the real world or even to the prototype world.

Any prototypes. Anyone’s done with CDs are true. What you got. Can you share with me? Yeah, one moment. Okay. Can you guys see my screen? Yes. Alright? So I think that the big thing here is the ability to kind of be able to put this in a Python or Jupiter notebook and to be able to kind of just step through it.

I don’t think I’m going to run any code, at least step through kind of the pieces in the cells that we’ve got within the notebook. So if you’re familiar with Python, we start by importing a couple libraries.

The Snowflake connector and Pandas. If You’re Not familiar with Python, basically what we’re doing here is we’re grabbing having Two libraries. One that’s going to be able to allow us to connect to our Snowflake instance, to be able to grab data, and then another called Pandas, which is really nice because it allows us to manipulate, modify.

Aggregate data frames within Python. And that’s going to be helpful and you’ll see that in just a moment. But we start by enabling our Snowflake connection and we kind of masked out the details there.

But assume we’ve created a connection there and we are then trying to be able to create a cursor object, which is going to allow us to run queries against the Snowflake instance, which is what we’re doing in this cell right here.

We are executing a couple of statements and you saw us doing that earlier, where we are trying to use the account admin role. We’re trying to get the database DLH demo Sandbox, and we’re trying to use the Ana Pod warehouse.

You can tell who illustrated or orchestrated this Python notebook. But yeah, from there, since we’ve got our connection created, we’ve got our Snowflake credentials and we’ve got everything kind of set up with Snowflake.

We’re then able to go ahead and start creating our UDFs. And this function is UDTF. This one’s pretty familiar because this is the second one that I ran, which allowed us to get counts of transactions by a specific product.

So grouping by product ID, we want to get the count of however many transactions for each product ID in that transaction history table. So we go ahead and we run this. It creates within Snowflake and then we are then able to.

Execute query right here. So it’s the same query that I ran that we just did, a select star on the UDTF so it returns a table. We are selecting all of those records in the UDTF and pretty straightforward.

But then what we’re doing in this next step is we are essentially using Pandas to grab the results of that query, the query execution that we just did. So we executed this on the Snowflake end and I’m not 100% sure how this works on the back end, but through our Snowflake connector we have access to this data that is from the UDTF and we are asking the we’re asking the Snowflake connector to fetch it and put it in a panic data frame right here.

So that’s what we do. And when we call the shape function, which is I believe it’s Pandas as well, we see that we’ve got 441 rows and two columns and we call the columns function, we see that we’ve got a we’ve got product ID and transaction count as our columns which corresponds to what we saw from the UDTF return results set.

So everything looks good there. And now right here with this code, we are asking the we’re asking to send a plot basically so we can then see the. The product ID and the number of transactions on the y axis.

Get product ID on the X axis. Number of transactions on the Y axis. And there we got kind of a basic plot of what we had with our result set of UDT. And then this next, I guess, section of code is we’ve got the second UDTF that I showed in my previous demo, and we’re doing the exact same thing.

So we’re asking Snowflake to execute this, and then we’re trying to grab the results. Set of the purchase totals by vendor UDTF. And then we put it in the Pandas data frame. We get its shape. It’s got 86 records and two columns.

The columns being vendor ID. And total order vendor. And then we do the same thing again. We put it and plot it on a bar graph and we’ve got a results there. It’s about it. Very cool, right? Yeah. Maybe this group is very similar, but everybody is very interested in Python.

I think it’s kind of a ubiquitous language these days. And the more we talk about Python. Those are typically the groups where we get more turnout at our meetups here. So that’s kind of an interesting correlation, I think.

I don’t know if anybody has any questions, but just kind of wrap with UDF for this meet up or holiday meet up. I think we talked about the benefits, java Python reusing skills on the internal teams, and then of course, the integration is just awesome.

We love Snowflake because that integration appears to be so, so seamless and again, the leveraging of the skills and be able to reuse code, that’s awesome too. So we talked about Java UDFs. We went over SQL UDFs.

We had some other Python integration again, like with Russia Pandas and some plotting and just pure execution of getting the data from Snowflake and calling this functions against Snowflake via Python.

So, very cool stuff. We’ll just keep wrapping up, and if I have any questions, jump in. So, yes, so we’re starting to create kind of a nice little code base over here. This is the link to our AICG meetups and webinars.

Check it out, give us a star, let us know if there’s any issues. We love people who check us out and want to contribute. And if anybody ever wants to contribute on the meetup, definitely let us know.

We’re happy to have folks. We’re trying to get a few Snowflake engineers to our meet ups as well. So if anybody’s got some good topics or some advanced topics, and they like to get the good word from anyone from Snowflake engineering team, just let us know.

We can try to get somebody to join us maybe in the new year. So I’ll just open it up, see if anybody’s got any Q and A or any questions. And I guess we’ve got a few more minutes here, or if not, we can go ahead and wrap up while people are thinking about their thoughts.

Heather, what do we have coming up in the next couple of months by way of events? If you don’t have it on the tip of your tongue, that’s okay. Yeah, I do. I have it pulled up somewhere over here. Let me see.

So, yeah, we have financial forecasting using Python and Snowflake coming up on January 17 and then we’re mixing it up a little bit and doing a second event in January. And we’ll be talking about DataLakeHouse on Snowflake, and that’s going to be on a Wednesday at noon instead of our regular 06:00 PM ET

Meeting on a Monday. So we’re going to kind of test things out and see what works best for everybody in our group. And if anybody has any feedback on timing and days that work better for them, please give us some feedback.

We would love to hear it. Well, I saw a few things go to the chat, but I think if there’s not too many other questions, we can go and wrap up and just say thank you all for joining out. Great job for everyone who is presenting everything awesome.

Looks good. And feel free to reach out to us guys on social media. If there’s any questions, anything we can discuss with you, we’ll go from there. We’ll see you on the next one. So thanks, everybody.

Have a great night and happy holidays. Of course. Happy holidays, everybody. Thank you. Bye bye. Thanks, everyone.

More to explorer

Snowflake Loading Data with Special Characters

July 24, 2024 No Comments

Special characters in your column names can cause chaos for downstream users, tools and processes.

Building a Generative AI Competency (or the First Gen AI Project)

July 21, 2024 No Comments

When Building a Generative AI Competency one must identify the necessary infrastructure, architecture, platform, and other resources and partners that can help an AI initiative be successful. We have just like many data warehouse and digital transformation initiatives over the last 20 years fail because of poor leadership, or companies only going half in on the objective.

clock, time management, time-3222267.jpg

Snowflake Time Travel Not the First Time Traveller but Let’s Review

June 12, 2024 No Comments

IBM DB2 long before Snowflake had this concept as did a few other select databases using a Temporal Database concept. I think Snowflake was able to make it more popular and mainstream due to the association of Data Warehousing and analytics specifically