The Data Signal
The Data Signal
The Data Signal Digest — Stack glue goes native, engines race for throughput, robots roll into July
0:00
-6:51

The Data Signal Digest — Stack glue goes native, engines race for throughput, robots roll into July

This week every cloud giant seemed to whisper the same promise: “Less plumbing, more doing.”

Hey! The summer solstice delivered a sprint of “finally!” features: Snowflake wrapped dbt right into Snowsight, Redshift found a 40 % ingestion turbo-button, Databricks opened its lineage doors to the outside world, and Google made Bigtable play nice with BigQuery at full speed. Let’s dive in.

Snowflake Summit 2025 – all the news and updates as it happened | TechRadar

Big Signal

dbt turns first-class in the Data Cloud — Snowflake launches “dbt Projects on Snowflake” (Preview)
Snowflake users can now create, edit, test, and run dbt Core projects straight inside Snowsight Workspaces, and even deploy them as schema-level DBT PROJECT objects callable from SQL or CI pipelines (plus Snowflake CLI support). The move erases the “separate repo + runner” tax, giving data engineers one governed place to version both models and infra. With semantic views and budgets already in preview, Snowflake is positioning itself as the one-stop transformation and governance plane. Expect consumption-based pricing to follow when the feature graduates. “If your models live where your data lives, latency isn’t just lower—it’s gone,” remarked an early design partner. docs.snowflake.com

Ingest faster, queue less — Amazon Redshift patch 187 enables true concurrent inserts
Patch 187 finally lets multiple INSERT and COPY statements scan in parallel while staging writes, cutting end-to-end runtimes by up to 40 % in AWS benchmarks and slashing the need for staging tables. Snapshot isolation still protects consistency, but write-heavy dashboards now refresh minutes sooner, and multi-warehouse data-sharing jobs see similar gains. Nothing to toggle—just upgrade the cluster (or let serverless roll forward) and watch throughput graphs pop. Analysts who batch micro-updates every few seconds get “near-real-time freshness without ETL gymnastics,” per the Big Data Blog post. aws.amazon.com

Lineage without borders — Databricks rolls out “Bring Your Own Data Lineage”
Unity Catalog can now ingest lineage metadata from tools that sit outside Databricks—think Salesforce ingests or Tableau extracts—merging them into one end-to-end graph. Teams pipe JSON events or call an API to stitch off-platform hops onto native captures, finally answering “where did this KPI really start?” in regulated audits. The preview lands alongside an open-sourced JDBC driver (Apache-2), hinting at a broader “open glass” strategy. Users integrating Fivetran and Power BI in the same workspace report a 90 % drop in orphan steps inside the lineage UI. docs.databricks.com

Bigtable hits the gas — “Data Boost” reaches GA for cross-service analytics
Google Cloud’s new Data Boost lets BigQuery fire heavy joins at Bigtable replicas that are isolated from production traffic, eliminating the age-old trade-off between OLTP uptime and OLAP curiosity. Under the hood, BigQuery spins transient compute pools right next to the chosen Bigtable zone, then discards them after query completion—so you pay only for the boost minutes, not a second warehouse. Early adopters in ad-tech see scan latencies fall from hours of export to minutes of SQL. cloud.google.com


Signal Pulse

  • Copy-paste tables: Snowflake’s new “clone dynamic table as table” command (GA) lets engineers snapshot a dynamic table into a static one—policies, tags, and clustering included—perfect for sandboxing last month’s state without rewiring jobs docs.snowflake.com

  • Driver goes public: Databricks open-sourced its JDBC driver under Apache-2, welcoming community PRs and vendor forks. Teams compiling custom dialects no longer wait on closed binaries. docs.databricks.com

  • Triggers on arrival: File-arrival triggers are now GA in Lakeflow pipelines; S3/GCS event hooks fire up to 1,000 jobs per workspace and drop the 10 k file cap, ending cron-polling hacks docs.databricks.com

  • Zero-trust sharing: OIDC federation for open Delta Sharing hits GA, letting recipients bring their own IdP tokens instead of long-lived Databricks credentials—an instant win for cross-cloud security auditors docs.databricks.com

  • Compliance by default: MLflow on Databricks now meets HIPAA, PCI-DSS, and FedRAMP Moderate under the new compliance security profile, clearing the runway for regulated model serving. docs.databricks.com

  • Polished previews: Snowflake’s “Premium Views” debut, packaging masking, row-level filters, and cost tags into one declarative CREATE PREMIUM VIEW syntax—BI teams get fine-grained security without view sprawl docs.snowflake.com


Tools to Know

  • Gemini CLI — Google’s fresh, open-source terminal helper wires Gemini models straight to your local repo: ask it to refactor SQL, generate tests, or even run commands. Apache-2 license and 60 free requests/minute sweeten adoption techcrunch.com

  • DuckLake — A new DuckDB extension that turns SQL itself into a lakehouse table format: metadata lives in relational tables, data in Parquet, enabling multi-table transactions, time travel, and Iceberg import/export—minus the metastore overhead motherduck.com

  • OpenMetadata 1.7.5 — The June drop adds multi-owner support for dbt models, richer lineage collapse controls, and dozens of connector fixes—crucial for teams splitting stewardship across domains docs.open-metadata.org


Top Reads

  • Uber — “Slashing CI Costs at Uber”
    A deep dive into SubmitQueue optimizations that cut CI resource usage 53 % and wait times 37 %. Even if you’re not running a thousand microservices, the speculation-tree tricks are reusable in data pipelines uber.com

  • Snowflake — “Arctic Long Sequence Training (ALST)”
    Snowflake AI Research details how it trains 15-million-token sequences on just four H100 nodes—469× the baseline—and open-sources the recipe. Useful reading for anyone pushing multimodal RAG limits snowflake.com


Next-Week Events


That’s the signal for the week. Spot something we missed? Hit reply or drop a tip at thedatasignal.com—your scoop might headline next Friday’s digest.

Discussion about this episode

User's avatar

Ready for more?