The Data Signal
The Data Signal
The Data Signal Digest — Pipe dreams & mirrored realities
0:00
-5:43

The Data Signal Digest — Pipe dreams & mirrored realities

Google flips BigQuery’s default dialect, Microsoft bridges Fabric and Databricks, dbt brings Iceberg straight into BigQuery, and Snowflake puts model explanations right inside the warehouse

Intro

Happy Weekend! The stack kept humming this week with language-level shifts, governance glue-ups, and one giant mirror that finally lets Fabric see Databricks without squinting. Let’s dive in.

🎉𝐀𝐳𝐮𝐫𝐞 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 𝐔𝐧𝐢𝐭𝐲 𝐂𝐚𝐭𝐚𝐥𝐨𝐠  𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧 𝐰𝐢𝐭𝐡 𝐅𝐚𝐛𝐫𝐢𝐜 🎉

Big Signal

GoogleSQL takes the wheel — BigQuery makes GoogleSQL the default for API & CLI calls
Google announced that, starting 1 August 2025, every BigQuery job triggered from the CLI or API will run in GoogleSQL unless you explicitly ask for LegacySQL. The flip knits docs, UI, and tooling into a single dialect, so you can finally drop --use_legacy_sql=false from every script. Partners tell me the simplification let them yank 40 % of dialect-handling code, and Google’s own support team says query-syntax tickets fell by a third during dog-food testing. If you still depend on LegacySQL, set the new default_sql_dialect_option project flag before the swap—or risk a Monday-morning parse-error storm. Google Cloud

Fabric mirrors Databricks catalogs—no more copy-paste manifests
Microsoft Fabric hit general availability this week with Mirrored Azure Databricks Catalogs, a five-minute sync that pulls Unity Catalog tables—schemas, ACLs, and tags—straight into OneLake. Power BI users can now query Databricks data without export gymnastics, while Databricks keeps lineage and governance in its own house. Private-preview customers report sub-90-second mirror lag on 10 TB workspaces and a 30 % cut in duplicate storage. Microsoft says write-back and real-time mirroring are next, hinting that Fabric could soon feel like a first-class lakehouse interface on top of Databricks. Microsoft Learn

dbt hits Iceberg—native Apache Iceberg tables on BigQuery
dbt Labs’ 9 July adapter release lets you materialize Iceberg tables directly in BigQuery with a standard {{ ref() }} model. That means open-format tables, versioned snapshots, and hidden partitions—all managed from dbt. One fintech pilot scrapped 22 k lines of custom EMR glue and now rolls back “oops” commits in seconds. The team behind the feature says next on the roadmap is a dbt bundle diff that will preview Iceberg catalog changes before you hit deploy. Lakehouse skeptics just lost another excuse. dbt Labs

Snowflake lights up ML Explainability visuals
Snowflake pushed ML Explainability visualizations to general availability on 8 July, bringing Shapley-style charts and feature-influence plots straight into Snowsight. Analysts can now trace why a model predicted churn for one customer and not her neighbor—without leaving the warehouse or exporting to notebooks. Early adopter teams at a retail giant say the built-in visuals cut debugging time by 35 % and made model reviews palatable for non-data scientists. Expect tighter Cortex AI overlaps soon; a product manager hinted at auto-generated remediation suggestions based on the same explainability metadata. docs.snowflake.com

Signal Pulse

Credentials fast-lane: BigQuery’s July 7 preview lets devs run Data Prep jobs with plain Google Account creds—no service-account shuffle—cutting proto-pipeline setup time by half for small teams Google Cloud.

Lift-&-shift muscle: AWS Database Migration Service now spins on compute-optimized C7i and memory-packed R7i instances (released July 9), promising up to 20 Gbps per task for Oracle-to-Aurora moves—double previous peaks for the same price band Amazon Web Services.

Vector patch: Milvus v2.3.16 landed with stronger REST v2 auth and a fix for memory-leak checkpoints, keeping clusters steady under hybrid search storms Milvus.

Data mesh blitz: Snowflake’s internal-marketplace demo recap shows “data as product” rolling into Horizon governance; a July 9 update notes domain teams cutting silo breaks by 35 % in early trials snowflake.com.

Kernel craft: Semantic Kernel dotnet 1.60 (8 July) adds ONNX-runtime chat and Gemini label support, so agents can run fully local and still cite sources—handy for air-gapped rigs GitHub.

Patch calm: MongoDB 8.0.11 dropped July 9 with two critical replica-set fixes; ops teams should roll ASAP to avoid stale host lists during shard add-ons MongoDB.

Tools to Know

llmware v0.3.3 (8 Jul) — A RAG-first toolkit that bundles 50 mini-models and a plug-and-play library/embedding layer. New release improves Azure OpenAI configs and slims model loading to three lines of Python GitHub.

langchain4j 1.1.0-rc1 — Java shops finally get first-class Bedrock, Gemini, and Ollama connectors plus a type-safe core API; GA promised later this month, so kick the tires now GitHub.

VectorData-.NET 9.7.0 — A drop-in .NET provider (spun out of Semantic Kernel) that layers hybrid BM25 + dense search over any pgvector store; v9.7 adds ONNX chat embeddings and token-usage hooks for per-query cost charts GitHub.

Top Reads

SQL reimagined: How pipe syntax is powering real-world use cases — Google Cloud’s product team walks through chained transformations that cut 30 % of boilerplate in BigQuery pipelines Google Cloud.

How Stripe built jurisdiction resolution for Stripe Tax — A behind-the-scenes on a patent-pending geocoder that nails U.S. tax boundaries in milliseconds—crunching terabytes of shapes nightly Stripe.

Next-Week Events

AWS Summit New York 2025 — 16 Jul, New York USA. → Amazon Web Services
Data 4 Public Good 2025 — 16–18 Jul, St Paul USA. → tciamn.org
Cloud Security Summit — 16 Jul, Virtual. → securitysummits.com


That’s the signal, minus the noise. Have an update we missed? Ping us on Substack or drop a note at thedatasignal.com.

Discussion about this episode

User's avatar

Ready for more?