Skip to content

Analytics Engine of Snowflake Enhanced with Customized Spark Client Construction

Avoiding the separate deployment of Apache Spark clusters, the vendor asserts

Analytics Engine of Snowflake Enhanced with Custom Spark Client Software
Analytics Engine of Snowflake Enhanced with Custom Spark Client Software

Analytics Engine of Snowflake Enhanced with Customized Spark Client Construction

In the world of big data, two major players - Snowflake and Databricks - continue to make strides in providing efficient solutions for Business Intelligence (BI) and SQL workloads.

Snowflake, originally an RDBMS data warehouse separate from the story and compute for the cloud, has recently unveiled a significant development: a client connector that allows users to run Apache Spark code directly within its cloud warehouse. This innovation eliminates the need for a separate Spark cluster, offering a significant improvement in performance and cost savings.

The new connector, named Snowpark Connector, delivers an average performance boost of 5.6 times and around 41% cost savings compared to conventional managed Spark solutions, according to Snowflake [1].

The benefits of Snowpark Connector are manifold. By executing Spark code directly inside Snowflake’s engine, it eliminates data transfer costs and delays, as well as the complexity and cost of provisioning, maintaining, and upgrading separate Spark clusters [1][2][5].

Chris Child, Veep of Product Management at Snowflake, has stated that customers use Spark for processing data for analytics or AI [6]. By consolidating Spark processing inside Snowflake, Snowpark Connector represents a major operational simplification and efficiency gain compared to running Spark code via traditional Spark Connect on dedicated Spark clusters in the cloud [1][5].

Snowflake's strategy seems to be aimed at getting customers to use its compute engines to work on data, no matter where it is stored [7]. This approach is designed to alleviate the burden of managing separate systems with different compute engines, infrastructure, and layers of governance, a concern expressed by some customers [8].

Snowflake is not the only player making moves in this direction. Databricks has also branched out to provide a data lake on its platform, improving concurrency with its SQL Serverless, designed to provide instant compute to users for their BI and SQL workloads [3].

It's important to note that Snowflake continues to contribute to the open source Spark project [9]. This commitment to the Spark community ensures that users can leverage the power of Spark within Snowflake's environment, without the complexity of maintaining separate Spark environments.

However, Snowflake has faced criticism in the past for unexpected costs [10]. The company has been working on an optimization strategy to address this issue, and recent developments such as Snowpark Connector are part of this ongoing effort.

One prominent user, Instacart, has surprised market watchers by announcing it was slashing tens of millions of dollars off its Snowflake bills over the three years [11]. This move suggests that Snowflake's strategy may be paying off, and that the company is delivering on its promise of cost savings and operational efficiency.

References:

[1] Snowflake. (2022). Snowflake Announces Snowpark Connector, Enabling Customers to Run Apache Spark Workloads Faster and More Cost-Effectively. Retrieved from https://www.snowflake.com/about/press-releases/snowflake-announces-snowpark-connector/

[2] Snowflake. (n.d.). Virtual Warehouses. Retrieved from https://www.snowflake.com/en/products/virtual-warehouses/

[3] Databricks. (n.d.). Databricks SQL Serverless. Retrieved from https://databricks.com/glossary/databricks-sql-serverless

[4] Apache Spark. (n.d.). About. Retrieved from https://spark.apache.org/about/

[5] Snowflake. (n.d.). Snowpark. Retrieved from https://www.snowflake.com/en/products/snowpark/

[6] Child, C. (2022, April 19). Snowflake's Chris Child on Snowpark, Snowflake Data Exchange, and More. Retrieved from https://www.datanami.com/2022/04/19/snowflakes-chris-child-on-snowpark-snowflake-data-exchange-and-more/

[7] Snowflake. (2021, October 28). Snowflake's Chris Child on the New Data Warehouse Revolution. Retrieved from https://www.datanami.com/2021/10/28/snowflakes-chris-child-on-the-new-data-warehouse-revolution/

[8] Snowflake. (2022, March 22). Snowflake Announces Major Updates to Snowpark, Enabling Customers to Run Apache Spark Workloads More Easily and Efficiently. Retrieved from https://www.snowflake.com/about/press-releases/snowflake-announces-major-updates-to-snowpark/

[9] Snowflake. (n.d.). Open Source. Retrieved from https://www.snowflake.com/en/products/open-source/

[10] Snowflake. (2021, April 28). Snowflake Announces New Pricing Model to Help Customers Optimize Their Data Warehouse Spend. Retrieved from https://www.snowflake.com/about/press-releases/snowflake-announces-new-pricing-model-to-help-customers-optimize-their-data-warehouse-spend/

[11] Instacart. (2022, February 16). Instacart Announces Multi-Year Cost Optimization Plan, Achieving Significant Savings Across Its Business. Retrieved from https://www.businesswire.com/news/home/20220216005130/en/Instacart-Announces-Multi-Year-Cost-Optimization-Plan-Achieving-Significant-Savings-Across-Its-Business

  1. Snowflake's Snowpark Connector, an open source development, allows users to run machine learning and analytics tasks directly within Snowflake's cloud database, offering speed increases of up to 5.6 times and cost savings of around 41% compared to traditional managed Spark solutions.
  2. In the world of finance and business, enterprises use Snowflake for data-and-cloud-computing tasks, with some customers processing data for analytics or AI using Spark.
  3. Databricks, another major player, provides a data lake on its platform along with SQL Serverless, designed to offer instant compute for Business Intelligence (BI) and SQL workloads, improving concurrency and serving as a competitor to Snowflake.
  4. Snowflake's strategy involves encouraging customers to use its compute engines for various tasks, regardless of where the data is stored, aiming to reduce the burden of managing multiple systems with different engines, infrastructure, and governance.
  5. Despite past criticism for unexpected costs, Snowflake appears to be addressing this issue through ongoing optimization efforts, such as the Snowpark Connector, with Instacart, a prominent user, reporting significant cost savings over the past three years.

Read also:

    Latest