Empowering Data & ML Apps with Snowflake’s Machine Learning Features and Snowpark through Streamlit

Cloud

August 30, 2023

Today, every emerging technology revolves around data. Data-driven applications and Machine Learning (ML) models are becoming increasingly critical for businesses to gain insights and make informed decisions. With data availability from various sources in real time, businesses must imbibe data analytics to derive insights from existing data. Businesses leveraging data-driven business decisions are more likely to get ahead in this competitive environment by leveraging data platforms as a self-managed service.

Snowflake, a cloud data platform, provides a robust foundation for storing and processing data at scale. It helps store & access structured, semi-structured, and unstructured data in one location and gain seamless access to external data with similar scale and speed. Due to the limitations of SQL-based querying and data processing, Snowflake introduced Snowpark, which extends languages used in Snowflake so that developers can code in a language of their choice & leverage Snowflake’s compute & storage for data manipulation and analysis.

With Snowflake being the only source of data, it became important for business users to quickly derive business insights from their data and leverage Snowflake’s machine learning capabilities to build & deploy ML models. With Streamlit’s acquisition in 2022, Snowflake integrated capabilities to build interactive apps & visualization features, thereby providing data scientists to share the insights generated by the ML models.

In this blog, we will delve deeper into these areas and explore how to leverage Snowflake’s AI-ML features to develop data and ML apps that can provide real-time business insights.

Snowflake’s Architecture

Snowflake’s unique architecture, a hybrid between shared-disk and shared-nothing architecture, offers a scalable and flexible cloud-based data platform that allows organizations to store and analyze massive amounts of data. The scalable compute, decoupled from the storage layer, will support multiple workloads. Moreover, it provides robust security features, data-sharing capabilities, and a SQL & Python-based interface for querying and manipulating data. Snowflake also ensures data security by enabling users to use features such as MFA and Network Policies to restrict access to the Snowflake environment, Data Encryption, and Data Masking.

Fig 1: Snowflake Cloud Data Platform Architecture

What is Snowpark & why should you use it?

With Snowflake’s existing capabilities, SQL was the only supported language for data querying and processing. Hence Snowpark is an exciting addition to Snowflake that allows developers to write custom code using popular programming languages such as Java, Scala, and Python, querying data residing in Snowflake.

How to use Snowpark: With Snowpark, developers can extend Snowflake’s capabilities by writing & executing User-Defined Functions (UDFs) & Stored Procedures (SPROCS) directly within the Snowflake environment. This enables advanced data transformations and complex analytics without data movement.

Snowpark for Python is an API that enables developers to query and process data using Python in their chosen Integrated Development Environment (IDE). It ensures that there are no bottlenecks, such as package dependencies, while creating custom user-defined functions (UDFs). This is due to integrating with the Anaconda ecosystem, a popular Python distribution. The API allows developers to import and use packages such as NumPy, Pandas, Scikit-learn, TensorFlow, or any other packages from the Anaconda distribution.

Fig 2: Snowpark API

By leveraging the Snowpark API, developers could build and train ML models on Snowflake’s large datasets. The Snowflake machine learning framework simplifies the automation and deployment of ML pipelines. Once the pipelines are deployed, there is a need for business consumers to derive insights from these models. It becomes tedious for a business user to call the scoring function in the Snowflake environment to consume the ML models. To address this challenge, Snowflake acquired Streamlit in 2022. This enabled business users to interact and understand the data, thereby improving their business decisions and effectively utilizing data to drive decision-making. We shall learn more about Streamlit in the following section.

Streamlit & Development of Data Apps

Streamlit is an app development Python framework that simplifies the process of building interactive data applications and ML models both in Snowflake (Private preview) and outside Snowflake. It provides an intuitive and declarative way to create custom web-based dashboards, allowing users to explore and visualize seamlessly. With Streamlit, you can create interactive components (navigation bars, filters, etc.), display plots and charts (using plotting libraries like Matplotlib, Plotly, Altair, etc.), and incorporate Snowflake machine learning models for real-time predictions.

By combining Snowflake’s powerful data management capabilities with Snowpark’s extensibility and Streamlit’s interactive features, data scientists and ML engineers can build data-driven applications and ML models with ease. Since this is a Python-based framework, there is no need to upskill or learn other languages like HTML or CSS to consume an ML model. There are several methods to host these data apps; the most used is the Streamlit Cloud, built on top of Google Cloud infrastructure to deploy the data app. Along with this, there is also Heroku and other cloud platforms such as AWS App Runner and Azure Registry for app deployment.

Snowflake itself is planning to incorporate hosting Streamlit apps within its environment, thereby eliminating the need to host it across other platforms. Snowflake users can then take advantage of the Data Marketplace to monetize the data apps they have built and share it with multiple users.

Fig 3: End-to-end Flow (Data Ingestion to Data Consumption via Streamlit)

Let us take an example to understand the use case that utilizes the three components we have discussed so far – Snowflake, Snowpark, and Streamlit.

Problem Statement: A retailer wants to understand which variables have the most impact on customer spending using Snowflake customer data. For this example, the variables of interest are:

  • Average session length (Time spent interacting with online store support)
  • Time spent on the website.
  • Time spent on the app.
  • Length of membership (Number of years the customer is a member of the retailer)

Solution: The ML engineer utilizes Snowpark-Python API to query the customer data in Snowflake and then perform data processing and transformation operations. Once completed, a machine learning model in Snowflake is trained, and a linear regression model is built and deployed in the Snowflake environment. Finally, after training the dataset and deploying the model, the developer can create a data app that the retailer can consume. With a few lines of code, an interactive UI is built that predicts customer spend, thanks to the scoring function, and displays the variables that impact customer engagement.

Snippets of the Data App Built

Fig 4: Snippets of the Data App Built

The retailer can then interact with the data app built on Snowflake’s compute, to see how they can tailor their decisions to increase customer engagement and improve their business.

Hexaware’s Snowflake Partnership and Capability

Hexaware is currently a Snowflake Premier Partner and is progressing toward becoming an Elite Partner by 2023. Since the inception, we have seen over 500% Growth YoY & have over 100+ Snowflake certified resources, with 30+ Snowflake Core & Advanced resources as well. With Amaze® for Snowflake as one of the primary accelerators, we have brought out multiple transformation stories for our customers.

We are one of the few partners with access to newly launched features in Snowflake in private preview mode, which we leverage to provide our customers with a more holistic view of their transformation journey.  We serve a diverse portfolio of over 30+ customers, including prestigious accounts such as a prominent health and insurance provider, a leading imaging and electronics company, a prominent mining company based in the US, and numerous others.

To summarize, Snowpark & Streamlit in Snowflake make it easy to develop and deploy data apps, so data engineers and business roles can leverage all the data at their fingertips and generate insights to transform how businesses function and respond to ever-changing market trends.

If you wish to learn more about how Hexaware can help accelerate your data journey on the Snowflake platform, reach out to us at marketing@hexaware.com.

About the Author

Navedya Ojha

Navedya Ojha

Navedya is a data engineer experienced in building and implementing solutions around the Snowflake data platform. Along with researching the latest capabilities around Snowflake, his responsibilities include engaging with customers to understand their problems/needs and addressing them by developing POCs and presenting them to clients.

Read more Read more image

Related Blogs

Every outcome starts with a conversation

Ready to Pursue Opportunity?

Connect Now

right arrow

ready_to_pursue
Ready to Pursue Opportunity?

Every outcome starts with a conversation