The Data Signal
The Data Signal
The Data Signal Digest — Metadata writes itself
0:00
-4:53

The Data Signal Digest — Metadata writes itself

Auto-documenting schemas, shielded clusters, and velocity boosts in every commit.

Welcome back, data friends! The quieter stretch after the summit still packs many product tweaks, security updates, and one very handy dose of auto-documentation magic. Let’s dive in.


Big Signal

One CLI to rule schema drift — dbt Core 1.10 landed with “governed exposures,” native event logging, and a simpler bundles-based deploy pattern so teams can promote artifacts atomically instead of pipeline by pipeline. “A huge win for auditability,” raved one community thread github.com.

Eyes everywhere — AWS rolled out GuardDuty Extended Threat Detection for EKS, chewing through control-plane, audit and data-plane logs without sidecars. “Customers get ML-powered findings in minutes, not hours,” AWS security lead Ryan Holland said on stage at re:Inforce 2025 aws.amazon.com.

Docs while you sleep — BigQuery users can now summon Gemini to draft table and column descriptions straight into Dataplex, turning unloved schemas into searchable assets with one click cloud.google.com.

Key management, leveled up — Supabase’s revamped API-keys page (in early access) separates service, anon and admin tokens and lets you rotate or restrict scopes in-place—no more editing ENV vars in prod supabase.com.


Signal Pulse

Stream tidy-up: Snowflake added pre-configured runtimes for Warehouse Notebooks, so analysts stop fighting conda conflicts and start sharing GPU images in two clicks; budgets tagging also hit GA docs.snowflake.com.

Patch party: Azure Data Studio 1.52 shipped a modernised Jupyter kernel picker, GitHub Copilot-powered notebook Q&A and native Mermaid diagrams—small UX lifts that make the SQL IDE feel less 2018 learn.microsoft.com.

Risk as a service: BigID launched the first managed DSPM package for MSSPs, promising “30-day data risk roll-ups” for mid-market clients who can’t hire a privacy army prnewswire.com.

Postgres + AI, seriously: EDB unveiled Postgres AI extensions that sync embeddings back to source tables and spin up LangChain pipelines “in five lines of SQL,” targeting sovereign-cloud holdouts businesswire.com.

Zero-trust storage: Qumulo Stratus debuted with cryptographically isolated tenants across on-prem and all major clouds—think S3 + SMB behind individualized HSM keys businesswire.com.

Agent Q: Palantir’s AIP now lets retrieval-augmented agents juggle multiple object types plus unified citation settings, smoothing those sprawling RAG prompts palantir.com.


Tools to Know

  • OpenBao — Vault’s open-source fork just joined the OpenSSF sandbox. Governance, transparent road-maps and a security baseline checklist aim to reassure teams wary of license flip-flops openssf.org.

  • Verified Permissions for Express — AWS dropped a ready-made Cedar policy helper so Node/Express APIs can bolt on fine-grained auth in minutes—no custom middleware spelunking required aws.amazon.com.

  • DuckDB docs 1.2.3-dev — The freshly generated docs set highlights new ICU timestamp helpers and a cleaner extension install flow—worth bookmarking while you wait for v1.3 duckdb.org.


Top Reads

Uber — The Evolution of Uber’s Search Platform (Jun 20) — Uber engineers trace a decade of scaling from a monolith Lucene cluster to today’s micro-service search mesh, now embracing open standards to cut infra sprawl. The post details how query latency dropped 60 % after a zero-copy gRPC refactor and why Uber is betting on vector-aware ranking next. A rare peek at truly planetary-scale search. → uber.com

AWS — Athena Managed Query Results (Jun 21) — Athena now stores and lifecycle-manages query results automatically—no more S3 bucket gymnastics. The blog walks through cost controls, encryption tooling and how teams can plug managed results straight into downstream ETL. Fewer foot-guns, faster insights. → aws.amazon.com


Next-Week Events (23 – 29 June 2025)

  • AI Impact Summit — 23–25 Jun, Sonoma CA & Virtual. → events.newsweek.com

  • DATACON Seattle 2025 — 24–27 Jun, Seattle WA. → datacon.us

  • Arize Observe 2025 — 25 Jun, San Francisco CA. → arize.com


Spot a signal we missed? Ping us on Substack or find us at thedatasignal.com

The Data Signal

Discussion about this episode

User's avatar

Ready for more?