{"id":75053,"date":"2026-04-16T11:37:57","date_gmt":"2026-04-16T11:37:57","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/data-specialist-role-blueprint-responsibilities-skills-kpis-and-career-path\/"},"modified":"2026-04-16T11:37:57","modified_gmt":"2026-04-16T11:37:57","slug":"data-specialist-role-blueprint-responsibilities-skills-kpis-and-career-path","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/data-specialist-role-blueprint-responsibilities-skills-kpis-and-career-path\/","title":{"rendered":"Data Specialist: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">1) Role Summary<\/h2>\n\n\n\n<p>A <strong>Data Specialist<\/strong> is a hands-on data professional responsible for ensuring that an organization\u2019s data is <strong>accurate, well-structured, accessible, and usable<\/strong> for analytics, operational reporting, and downstream data products. The role blends practical data engineering fundamentals (ingestion, transformation, validation) with analytics enablement (semantic definitions, metrics consistency, reporting readiness) and data governance execution (quality controls, documentation, access patterns).<\/p>\n\n\n\n<p>In a software company or IT organization, this role exists because modern products and internal operations generate high volumes of data across application databases, event streams, SaaS platforms, and customer touchpoints. Without a dedicated specialist to standardize and maintain the data supply chain, teams experience inconsistent metrics, unreliable reporting, slow analysis cycles, and elevated risk around privacy and compliance.<\/p>\n\n\n\n<p>The business value created includes <strong>trusted decision-making<\/strong>, <strong>faster time-to-insight<\/strong>, reduced rework for engineering and analytics teams, improved customer and operational outcomes, and stronger compliance posture through disciplined data handling practices.<\/p>\n\n\n\n<p>This is a <strong>Current<\/strong> role commonly found within <strong>Data &amp; Analytics<\/strong> organizations. It typically interacts with:\n&#8211; Data Engineering, Analytics Engineering, BI\/Reporting, Data Science (as applicable)\n&#8211; Product Management, Software Engineering, QA, SRE\/Operations\n&#8211; Finance, Sales Ops, Marketing Ops, Customer Success Ops\n&#8211; Security\/GRC, Privacy, Legal (when data contains sensitive attributes)\n&#8211; IT (identity\/access management, systems integration)<\/p>\n\n\n\n<p><strong>Seniority inference (conservative):<\/strong> Mid-level individual contributor (IC). The title implies specialized execution and ownership of defined data domains, with increasing autonomy but not people management by default.<\/p>\n\n\n\n<p><strong>Typical reporting line:<\/strong> Reports to a <strong>Data &amp; Analytics Manager<\/strong>, <strong>Analytics Engineering Lead<\/strong>, <strong>BI Manager<\/strong>, or <strong>Head of Data Platform<\/strong> depending on operating model.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2) Role Mission<\/h2>\n\n\n\n<p><strong>Core mission:<\/strong><br\/>\nDeliver and maintain <strong>trusted, well-defined, high-quality datasets and metrics<\/strong> that enable reliable reporting, analytics, and operational decision-making across the company.<\/p>\n\n\n\n<p><strong>Strategic importance to the company:<\/strong>\n&#8211; Turns raw product and business data into a dependable asset that supports revenue growth, cost control, product performance, and customer experience improvements.\n&#8211; Reduces organizational friction caused by conflicting metric definitions and inconsistent data pipelines.\n&#8211; Strengthens data governance through practical controls: validation, lineage, documentation, and access discipline.<\/p>\n\n\n\n<p><strong>Primary business outcomes expected:<\/strong>\n&#8211; Stakeholders can answer key business questions using <strong>consistent definitions<\/strong> and <strong>repeatable dashboards\/reports<\/strong>.\n&#8211; Data pipelines and curated datasets meet agreed SLAs for freshness, completeness, and accuracy.\n&#8211; Data issues are detected early, triaged efficiently, and remediated with clear root cause documentation.\n&#8211; Reduced \u201cshadow analytics\u201d and spreadsheet-driven metric fragmentation.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3) Core Responsibilities<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Strategic responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Own data readiness for assigned domains<\/strong> (e.g., product usage, subscriptions\/billing, customer lifecycle, support operations), aligning datasets and metrics with business priorities.<\/li>\n<li><strong>Define and maintain canonical metric definitions<\/strong> (e.g., active users, conversion, churn, ARR movements) in collaboration with analytics and business owners.<\/li>\n<li><strong>Contribute to the data roadmap<\/strong> by identifying reliability gaps, high-value dataset opportunities, and workflow improvements (testing, documentation, automation).<\/li>\n<li><strong>Influence data modeling standards<\/strong> (naming conventions, dimensional modeling patterns, semantic layer alignment) to improve consistency across teams.<\/li>\n<li><strong>Promote responsible data use<\/strong> by embedding governance expectations into day-to-day data delivery (classification, retention, access, and auditability).<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Operational responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"6\">\n<li><strong>Maintain and improve scheduled data pipelines<\/strong> (batch and\/or near-real-time) to meet SLA expectations for freshness and availability.<\/li>\n<li><strong>Monitor data quality signals<\/strong> (tests, anomaly detection, dashboard integrity) and respond to data incidents and stakeholder-reported issues.<\/li>\n<li><strong>Perform root cause analysis<\/strong> on data discrepancies, reconcile conflicting sources, and document resolutions and prevention measures.<\/li>\n<li><strong>Manage data backfills and reprocessing<\/strong> tasks safely, ensuring downstream consumers are notified and metrics integrity is preserved.<\/li>\n<li><strong>Support reporting cycles<\/strong> (weekly business reviews, monthly performance reporting, quarterly planning) by ensuring data availability and correctness.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Technical responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"11\">\n<li><strong>Develop and maintain transformations<\/strong> from raw sources to curated analytics-ready datasets (e.g., staging \u2192 intermediate \u2192 marts).<\/li>\n<li><strong>Implement data validation and testing<\/strong> (schema checks, accepted values, referential integrity, freshness) and enforce thresholds and alerting.<\/li>\n<li><strong>Optimize query and pipeline performance<\/strong> through partitioning strategies, incremental models, clustering, and cost-aware execution patterns.<\/li>\n<li><strong>Create and maintain curated tables and views<\/strong> aligned to an agreed business logic layer (semantic models, metric stores, or BI datasets).<\/li>\n<li><strong>Develop reusable components<\/strong> (SQL macros, templates, standardized logic for time zones, deduplication, identity resolution) to reduce duplication and errors.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Cross-functional or stakeholder responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"16\">\n<li><strong>Partner with engineering and product teams<\/strong> to ensure instrumentation and event tracking produce analyzable, stable data (event contracts, versioning, required properties).<\/li>\n<li><strong>Support self-serve analytics<\/strong> by enabling discoverability: data catalog entries, dataset descriptions, sample queries, and office hours.<\/li>\n<li><strong>Translate stakeholder questions<\/strong> into data requirements and deliverables, managing expectations around tradeoffs, lead times, and data limitations.<\/li>\n<li><strong>Coordinate changes<\/strong> that affect reporting (new product features, billing system updates, CRM field changes) to minimize downstream breakage.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Governance, compliance, or quality responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"20\">\n<li><strong>Apply data governance controls<\/strong> for sensitive data: classification, PII handling, access control patterns, and audit-friendly documentation.<\/li>\n<li><strong>Maintain lineage and documentation<\/strong> for priority datasets: sources, transformation steps, owners, refresh cadence, and quality checks.<\/li>\n<li><strong>Ensure metric consistency<\/strong> across BI assets by discouraging duplicate definitions and enforcing certified datasets.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Leadership responsibilities (IC-appropriate)<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"23\">\n<li><strong>Lead small initiatives<\/strong> (data quality uplift for a domain, consolidation of metric definitions, migration to a semantic layer) with clear scope and measurable outcomes.<\/li>\n<li><strong>Mentor analysts or junior data contributors<\/strong> on data standards, SQL quality practices, and reproducible reporting patterns (as needed, without formal management scope).<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4) Day-to-Day Activities<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Daily activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Review pipeline and data quality monitoring (failed jobs, freshness delays, test failures, anomaly alerts).<\/li>\n<li>Triage stakeholder questions: \u201cWhy did metric X change?\u201d, \u201cIs this dashboard accurate?\u201d, \u201cCan we trust this dataset today?\u201d<\/li>\n<li>Develop or refine SQL transformations and incremental models.<\/li>\n<li>Validate newly ingested data sources (schema drift checks, null rate shifts, duplicates).<\/li>\n<li>Update documentation for datasets touched that day (definitions, constraints, known limitations).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Weekly activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Attend recurring business review support sessions (e.g., product metrics review, revenue performance review) to confirm numbers align with definitions.<\/li>\n<li>Conduct a weekly data quality sweep for priority domains (top dashboards, certified datasets, critical pipelines).<\/li>\n<li>Work with engineering\/product on tracking changes (event schema updates, instrumentation gaps).<\/li>\n<li>Hold office hours or \u201cdata help desk\u201d blocks for analysts and business partners.<\/li>\n<li>Backlog grooming: prioritize fixes and enhancements based on impact, risk, and stakeholder urgency.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monthly or quarterly activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Support month-end or quarter-end reporting needs (Finance and RevOps alignment, revenue reconciliation).<\/li>\n<li>Re-certify key datasets and dashboards (confirm definitions, update owners, validate tests).<\/li>\n<li>Perform periodic access reviews with Security\/IT (especially for datasets containing PII or financial data).<\/li>\n<li>Capacity planning and roadmap alignment: identify technical debt, automation opportunities, and upcoming platform changes (e.g., migrations, new sources).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Recurring meetings or rituals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data &amp; Analytics standup (daily or 2\u20133x\/week).<\/li>\n<li>Sprint planning \/ weekly planning (Agile or Kanban cadence).<\/li>\n<li>Data incident review (weekly) and postmortems (as needed).<\/li>\n<li>Stakeholder syncs (Product, Finance, RevOps, Marketing Ops)\u2014frequency varies by domain.<\/li>\n<li>Governance touchpoints (monthly\/quarterly): privacy, security, compliance updates.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Incident, escalation, or emergency work (if relevant)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Participate in <strong>data incidents<\/strong> when critical dashboards or datasets are wrong or unavailable (e.g., executive reporting broken, billing metrics inconsistent).<\/li>\n<li>Perform rapid containment: disable faulty models, roll back changes, communicate impact, provide interim numbers when appropriate.<\/li>\n<li>Drive root cause analysis and implement preventative controls (tests, change management, stronger contracts).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5) Key Deliverables<\/h2>\n\n\n\n<p>Concrete deliverables commonly owned or heavily contributed to by a Data Specialist:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data assets and models<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Curated datasets \/ data marts for assigned domains (e.g., <code>mart_product_usage<\/code>, <code>mart_subscriptions<\/code>, <code>mart_customer_health<\/code>)<\/li>\n<li>Standardized transformation models (staging\/intermediate\/marts) with clear naming and structure<\/li>\n<li>Incremental processing logic and backfill procedures<\/li>\n<li>Documented metric layer definitions (e.g., \u201cActive User\u201d, \u201cNet Revenue Retention\u201d, \u201cTrial-to-Paid Conversion\u201d)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quality and reliability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data validation rules and automated tests (schema, constraints, freshness, reconciliations)<\/li>\n<li>Data quality dashboards (test coverage, failure rates, freshness SLAs)<\/li>\n<li>Incident runbooks and postmortems (root cause + preventative actions)<\/li>\n<li>Monitoring and alert configuration for critical assets<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reporting enablement<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Certified BI datasets and governed semantic models (where used)<\/li>\n<li>KPI dashboards or reporting extracts aligned to canonical definitions (often built with BI partners)<\/li>\n<li>\u201cSingle source of truth\u201d documentation for executive KPIs (definitions, filters, time windows, attribution logic)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance and documentation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data catalog entries for key datasets (owners, refresh cadence, lineage, sensitivity classification)<\/li>\n<li>Access patterns and role-based access recommendations<\/li>\n<li>Data dictionary for key domains and fields<\/li>\n<li>Change logs and release notes for impactful data changes<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational improvements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Automation scripts for repetitive tasks (e.g., auditing column usage, checking row counts, validating referential integrity)<\/li>\n<li>Performance optimization outcomes (reduced query costs, improved job run times)<\/li>\n<li>Training materials: \u201cHow to use dataset X\u201d, \u201cHow to interpret KPI Y\u201d<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6) Goals, Objectives, and Milestones<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">30-day goals (onboarding and baseline)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Understand the company\u2019s data ecosystem: core sources (product DB, event tracking, CRM, billing), warehouse\/lake, BI tools, and governance expectations.<\/li>\n<li>Gain access and complete required security\/privacy training.<\/li>\n<li>Review top 10 business-critical dashboards and their upstream datasets; identify fragility points.<\/li>\n<li>Deliver at least one small, production-grade improvement:<\/li>\n<li>Fix a recurring pipeline failure<\/li>\n<li>Add missing tests for a critical dataset<\/li>\n<li>Improve documentation for a high-traffic table<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">60-day goals (ownership and reliability)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Take primary ownership for at least one data domain (e.g., product usage or revenue).<\/li>\n<li>Implement meaningful quality controls:<\/li>\n<li>Freshness tests for critical pipelines<\/li>\n<li>Row count anomaly checks<\/li>\n<li>Uniqueness and referential integrity tests where appropriate<\/li>\n<li>Reduce stakeholder escalations by providing clearer definitions and quicker diagnostics (establish a standard triage workflow).<\/li>\n<li>Ship at least one curated dataset improvement that reduces analyst time (e.g., consolidated wide table or standardized metric view).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">90-day goals (scalable delivery)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Demonstrate end-to-end delivery: from requirement \u2192 model changes \u2192 tests \u2192 documentation \u2192 stakeholder rollout.<\/li>\n<li>Establish or strengthen a \u201ccertified dataset\u201d pattern for a domain, including definitions and ownership.<\/li>\n<li>Propose and deliver a small roadmap initiative (4\u20138 weeks) with measurable impact:<\/li>\n<li>Consolidate duplicate KPI logic across dashboards<\/li>\n<li>Implement cost\/performance optimizations in a high-cost area<\/li>\n<li>Introduce a standardized \u201cmetric calculation layer\u201d for a business area<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6-month milestones (domain excellence)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Achieve stable SLAs for assigned domain datasets (freshness and quality targets met consistently).<\/li>\n<li>Reduce recurring incident classes by implementing systemic preventative measures.<\/li>\n<li>Improve cross-functional alignment around instrumentation and event contracts with Product\/Engineering.<\/li>\n<li>Deliver a documented and tested metric set used by multiple teams (a genuine single source of truth).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12-month objectives (organizational leverage)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Become the recognized domain expert for a key data area, with clear ownership and stakeholder trust.<\/li>\n<li>Raise the organization\u2019s baseline maturity in at least one capability:<\/li>\n<li>Testing coverage and alerting<\/li>\n<li>Documentation and catalog usage<\/li>\n<li>Semantic consistency across BI<\/li>\n<li>Data governance execution for sensitive data<\/li>\n<li>Demonstrate measurable business impact:<\/li>\n<li>Faster reporting cycles<\/li>\n<li>Reduced decision delays<\/li>\n<li>Improved reliability for key KPIs<\/li>\n<li>Lower support burden for analytics questions<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Long-term impact goals (multi-year)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Help evolve the organization from \u201creporting outputs\u201d to <strong>data products<\/strong> with clear contracts, SLAs, and ownership.<\/li>\n<li>Enable scalable self-serve analytics with fewer bespoke requests and fewer metric disputes.<\/li>\n<li>Contribute to platform modernization (semantic layers, metric stores, real-time analytics) as the company matures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Role success definition<\/h3>\n\n\n\n<p>The role is successful when:\n&#8211; Business-critical data and reporting are <strong>trustworthy, explainable, and timely<\/strong>.\n&#8211; Stakeholders use consistent metrics and certified datasets rather than rebuilding logic in silos.\n&#8211; Data issues are detected early, resolved efficiently, and prevented from recurring.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What high performance looks like<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Anticipates upstream changes (product releases, billing changes) and prevents breakages through proactive coordination.<\/li>\n<li>Delivers robust data assets with tests, documentation, and clear ownership\u2014not just \u201cSQL that runs.\u201d<\/li>\n<li>Communicates tradeoffs crisply (freshness vs cost, accuracy vs speed) and earns trust through transparency.<\/li>\n<li>Improves systems, not just symptoms\u2014reducing incident recurrence and analyst rework.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7) KPIs and Productivity Metrics<\/h2>\n\n\n\n<p>The framework below balances <strong>output<\/strong> (what was produced) with <strong>outcomes<\/strong> (business impact) and <strong>quality\/reliability<\/strong> (trust and operational health). Targets vary by maturity; benchmarks below are illustrative for a mid-sized software\/IT organization.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Metric name<\/th>\n<th>What it measures<\/th>\n<th>Why it matters<\/th>\n<th>Example target \/ benchmark<\/th>\n<th>Frequency<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Curated dataset delivery throughput<\/td>\n<td>Number of production-ready datasets\/models delivered (with tests + docs)<\/td>\n<td>Ensures consistent delivery, not just ad hoc analysis<\/td>\n<td>2\u20136 meaningful model improvements\/month<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Stakeholder request cycle time<\/td>\n<td>Time from request intake to delivered dataset\/report change<\/td>\n<td>Reduces business waiting time and shadow analytics<\/td>\n<td>Median 5\u201315 business days depending on scope<\/td>\n<td>Weekly\/Monthly<\/td>\n<\/tr>\n<tr>\n<td>Certified metric adoption rate<\/td>\n<td>% of key dashboards using canonical definitions<\/td>\n<td>Reduces metric fragmentation and disputes<\/td>\n<td>70\u201390% adoption for top KPI dashboards<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Data test coverage (critical assets)<\/td>\n<td>% of critical tables\/models with automated tests<\/td>\n<td>Prevents regressions and increases trust<\/td>\n<td>80%+ for top-tier assets<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Data incident count (priority 1\/2)<\/td>\n<td>Number of high-severity data outages\/incorrect KPI events<\/td>\n<td>Direct signal of reliability<\/td>\n<td>Downward trend; P1 rare (0\u20131\/quarter)<\/td>\n<td>Monthly\/Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Mean time to detect (MTTD) data issues<\/td>\n<td>How quickly issues are detected by monitoring\/tests<\/td>\n<td>Early detection reduces business impact<\/td>\n<td>&lt; 60 minutes for critical pipelines<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Mean time to resolve (MTTR) data issues<\/td>\n<td>Time from detection to mitigation\/resolution<\/td>\n<td>Limits disruption to reporting and decisions<\/td>\n<td>&lt; 1 business day for common failures<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Freshness SLA attainment<\/td>\n<td>% of runs meeting freshness expectations<\/td>\n<td>Ensures reporting is timely<\/td>\n<td>95\u201399% for critical pipelines<\/td>\n<td>Weekly\/Monthly<\/td>\n<\/tr>\n<tr>\n<td>Data accuracy \/ reconciliation pass rate<\/td>\n<td>Reconciliation checks vs source systems (e.g., billing totals)<\/td>\n<td>Prevents financial\/reporting misstatements<\/td>\n<td>99%+ pass rate; issues documented<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Duplicate metric logic reduction<\/td>\n<td>Count of deprecated duplicate calculations<\/td>\n<td>Simplifies and standardizes analytics<\/td>\n<td>Retire 5\u201320 duplicates\/quarter<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Query performance \/ cost efficiency<\/td>\n<td>Warehouse compute cost for key models\/queries<\/td>\n<td>Controls spend and improves speed<\/td>\n<td>Reduce cost 10\u201330% in a hotspot<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Pipeline runtime SLA<\/td>\n<td>Job durations for critical pipelines<\/td>\n<td>Affects freshness and cost<\/td>\n<td>90th percentile within SLA<\/td>\n<td>Weekly<\/td>\n<\/tr>\n<tr>\n<td>Documentation completeness (priority datasets)<\/td>\n<td>Presence and quality of catalog entries and definitions<\/td>\n<td>Drives self-serve and reduces interrupts<\/td>\n<td>100% for certified datasets<\/td>\n<td>Monthly\/Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Stakeholder satisfaction score<\/td>\n<td>Survey or qualitative rating from primary partners<\/td>\n<td>Measures trust and usefulness<\/td>\n<td>4.2+\/5 or improving trend<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Rework rate<\/td>\n<td>% of delivered work requiring significant revision due to unclear requirements\/quality gaps<\/td>\n<td>Indicates requirements clarity and build quality<\/td>\n<td>&lt; 10\u201315%<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Cross-team dependency health<\/td>\n<td>Timeliness and quality of handoffs (instrumentation changes, source changes)<\/td>\n<td>Prevents breakage and delays<\/td>\n<td>Fewer emergency changes; planned releases<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Data governance compliance adherence<\/td>\n<td>Completion of access reviews, PII handling standards<\/td>\n<td>Reduces audit and privacy risks<\/td>\n<td>100% for sensitive domains<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Continuous improvement actions delivered<\/td>\n<td>Count of measurable improvements (automation, tests, standardization)<\/td>\n<td>Signals maturity building beyond tickets<\/td>\n<td>1\u20133 meaningful improvements\/month<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>On-call \/ escalation effectiveness (if applicable)<\/td>\n<td>Responsiveness and quality of incident comms<\/td>\n<td>Protects business operations<\/td>\n<td>Acknowledge &lt; 15 min; clear updates<\/td>\n<td>Per incident<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<p>Notes on measurement:\n&#8211; Define \u201ccritical assets\u201d via a tiering system (Tier 0\/1\/2) based on executive reporting and customer impact.\n&#8211; Use objective telemetry where possible (job logs, test results, incident tools) and supplement with stakeholder feedback quarterly.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8) Technical Skills Required<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Must-have technical skills<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>SQL (Advanced querying and transformations)<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Build transformations, validate data, support reconciliations, troubleshoot discrepancies.<br\/>\n   &#8211; <strong>Importance:<\/strong> Critical<\/p>\n<\/li>\n<li>\n<p><strong>Data modeling fundamentals (dimensional modeling, marts, normalization tradeoffs)<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Create durable datasets that support consistent analytics and reporting.<br\/>\n   &#8211; <strong>Importance:<\/strong> Critical<\/p>\n<\/li>\n<li>\n<p><strong>Data quality and validation techniques<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Implement checks for duplicates, nulls, accepted values, referential integrity, freshness, anomaly detection.<br\/>\n   &#8211; <strong>Importance:<\/strong> Critical<\/p>\n<\/li>\n<li>\n<p><strong>ETL\/ELT concepts and pipeline operations<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Understand scheduling, dependencies, incremental loads, idempotency, and failure handling.<br\/>\n   &#8211; <strong>Importance:<\/strong> Critical<\/p>\n<\/li>\n<li>\n<p><strong>Version control (Git) and change management discipline<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Reviewable PRs, rollback capability, traceability of data logic changes.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important (often critical in mature teams)<\/p>\n<\/li>\n<li>\n<p><strong>BI\/reporting fundamentals<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Ensure datasets are usable in dashboards; understand filters, joins, aggregation pitfalls, and metric semantics.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important<\/p>\n<\/li>\n<li>\n<p><strong>Data documentation and cataloging practices<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Maintain data dictionaries, dataset ownership, definitions, refresh cadence.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important<\/p>\n<\/li>\n<li>\n<p><strong>Basic scripting for automation (Python or equivalent)<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Automate audits, one-off validations, API pulls, or triage tooling.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Good-to-have technical skills<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Analytics engineering workflow tooling (dbt or similar)<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Modular transformations, testing, documentation generation, lineage.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important (Common in modern stacks)<\/p>\n<\/li>\n<li>\n<p><strong>Cloud data warehouse fundamentals<\/strong> (e.g., BigQuery, Snowflake, Redshift)<br\/>\n   &#8211; <strong>Use:<\/strong> Cost\/performance optimization, partitioning\/clustering, workload patterns.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important<\/p>\n<\/li>\n<li>\n<p><strong>Event tracking and instrumentation understanding<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Validate product analytics events, handle schema versions, ensure stable event contracts.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important<\/p>\n<\/li>\n<li>\n<p><strong>API-based data ingestion concepts<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Integrations with SaaS platforms (CRM, support, billing).<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional to Important (context-specific)<\/p>\n<\/li>\n<li>\n<p><strong>Basic statistics for anomaly detection and trend interpretation<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Identify suspicious changes and validate business reasonableness.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Advanced or expert-level technical skills<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Semantic layer \/ metrics layer design<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Centralize metric logic and governance to avoid dashboard drift.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional to Important (maturity-dependent)<\/p>\n<\/li>\n<li>\n<p><strong>Data observability engineering<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Build proactive monitoring, alert routing, anomaly detection at scale.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional (Important in data-product orgs)<\/p>\n<\/li>\n<li>\n<p><strong>Performance engineering for large-scale transformations<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Optimize models, reduce compute costs, manage concurrency, tune incremental strategies.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional to Important (scale-dependent)<\/p>\n<\/li>\n<li>\n<p><strong>Privacy-by-design implementation in analytics pipelines<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Tokenization, minimization, retention enforcement, access patterns, audit trails.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional (Critical in regulated environments)<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Emerging future skills for this role (next 2\u20135 years)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>AI-assisted data quality and anomaly triage<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Faster root cause identification and automated suggestions for tests.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional (becoming Important)<\/p>\n<\/li>\n<li>\n<p><strong>Data contracts and schema governance automation<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Enforce producer-consumer expectations for events and core tables.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional (becoming Important)<\/p>\n<\/li>\n<li>\n<p><strong>Metadata-driven pipelines<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Reduce bespoke ETL by generating transformations and checks from metadata.<br\/>\n   &#8211; <strong>Importance:<\/strong> Optional<\/p>\n<\/li>\n<li>\n<p><strong>Governed self-serve analytics enablement<\/strong><br\/>\n   &#8211; <strong>Use:<\/strong> Balancing broad access with consistent metrics and compliance controls.<br\/>\n   &#8211; <strong>Importance:<\/strong> Important trend<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9) Soft Skills and Behavioral Capabilities<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Analytical rigor and skepticism<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Data is often \u201cplausible but wrong.\u201d This role protects trust by validating assumptions.\n   &#8211; <strong>On the job:<\/strong> Asks \u201cWhat changed?\u201d, compares to baselines, checks edge cases, and confirms with source-of-truth systems.\n   &#8211; <strong>Strong performance:<\/strong> Finds issues before stakeholders do; documents evidence and reasoning clearly.<\/p>\n<\/li>\n<li>\n<p><strong>Clear communication (technical-to-non-technical translation)<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Stakeholders need actionable explanations, not SQL details.\n   &#8211; <strong>On the job:<\/strong> Writes crisp incident updates, explains metric definitions, and sets expectations on timelines.\n   &#8211; <strong>Strong performance:<\/strong> Reduces confusion, prevents repeated questions, and builds credibility.<\/p>\n<\/li>\n<li>\n<p><strong>Stakeholder management and prioritization<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Demand is usually higher than capacity; priorities must align to business value and risk.\n   &#8211; <strong>On the job:<\/strong> Uses impact\/risk framing, negotiates scope, and sequences work transparently.\n   &#8211; <strong>Strong performance:<\/strong> Stakeholders feel supported even when deprioritized, because tradeoffs are explicit.<\/p>\n<\/li>\n<li>\n<p><strong>Attention to detail with pragmatic judgment<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Small logic changes can materially impact business KPIs; perfectionism can also block progress.\n   &#8211; <strong>On the job:<\/strong> Applies strong validation to high-impact assets, uses \u201cgood enough\u201d for low-risk exploratory needs.\n   &#8211; <strong>Strong performance:<\/strong> Minimizes regressions while keeping delivery velocity healthy.<\/p>\n<\/li>\n<li>\n<p><strong>Ownership mindset<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Data issues often span teams; someone must drive closure.\n   &#8211; <strong>On the job:<\/strong> Takes initiative to coordinate fixes, track follow-ups, and ensure prevention measures are implemented.\n   &#8211; <strong>Strong performance:<\/strong> Fewer recurring incidents; clear accountability and improved system health.<\/p>\n<\/li>\n<li>\n<p><strong>Collaboration and influence without authority<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Upstream fixes often require Engineering\/Product; governance needs buy-in.\n   &#8211; <strong>On the job:<\/strong> Aligns on event tracking contracts, advocates for instrumentation improvements, negotiates changes.\n   &#8211; <strong>Strong performance:<\/strong> Achieves outcomes through partnership rather than escalation.<\/p>\n<\/li>\n<li>\n<p><strong>Structured problem solving<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Data problems can be ambiguous and multi-causal.\n   &#8211; <strong>On the job:<\/strong> Uses hypotheses, isolates variables, reproduces issues, and documents root causes.\n   &#8211; <strong>Strong performance:<\/strong> Faster resolution with fewer false fixes and better preventive actions.<\/p>\n<\/li>\n<li>\n<p><strong>Documentation discipline<\/strong>\n   &#8211; <strong>Why it matters:<\/strong> Unwritten knowledge leads to fragility and repeated interrupts.\n   &#8211; <strong>On the job:<\/strong> Maintains definitions, runbooks, known limitations, and change notes.\n   &#8211; <strong>Strong performance:<\/strong> Self-serve usage increases; fewer ad hoc explanations required.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10) Tools, Platforms, and Software<\/h2>\n\n\n\n<p>Tooling varies by organization; the list below reflects common, realistic tools for a Data Specialist in a software\/IT context.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Tool \/ platform \/ software<\/th>\n<th>Primary use<\/th>\n<th>Common \/ Optional \/ Context-specific<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Cloud platforms<\/td>\n<td>AWS \/ Azure \/ GCP<\/td>\n<td>Host data infrastructure and services<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Data warehouse<\/td>\n<td>Snowflake<\/td>\n<td>Analytics warehouse, transformations, sharing<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data warehouse<\/td>\n<td>BigQuery<\/td>\n<td>Analytics warehouse, large-scale SQL, cost controls<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data warehouse<\/td>\n<td>Amazon Redshift<\/td>\n<td>Analytics warehouse in AWS-centric orgs<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Data lake \/ storage<\/td>\n<td>S3 \/ ADLS \/ GCS<\/td>\n<td>Raw storage, staging, extracts, archival<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data transformation<\/td>\n<td>dbt<\/td>\n<td>ELT modeling, testing, documentation, lineage<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data integration<\/td>\n<td>Fivetran<\/td>\n<td>Ingest SaaS sources into warehouse<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Data integration<\/td>\n<td>Airbyte<\/td>\n<td>Open-source ingestion\/connectors<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Orchestration<\/td>\n<td>Airflow \/ Cloud Composer<\/td>\n<td>Scheduling, dependency management<\/td>\n<td>Optional (Common in data platform orgs)<\/td>\n<\/tr>\n<tr>\n<td>Orchestration<\/td>\n<td>Dagster<\/td>\n<td>Modern orchestration with assets\/metadata<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>BI \/ dashboards<\/td>\n<td>Looker<\/td>\n<td>Semantic modeling + dashboards<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>BI \/ dashboards<\/td>\n<td>Tableau<\/td>\n<td>Dashboards, reporting<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>BI \/ dashboards<\/td>\n<td>Power BI<\/td>\n<td>Dashboards, enterprise reporting<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>BI \/ dashboards<\/td>\n<td>Metabase<\/td>\n<td>Lightweight self-serve BI<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Observability (data)<\/td>\n<td>Monte Carlo \/ Bigeye<\/td>\n<td>Data downtime detection, anomaly monitoring<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Monitoring\/alerts<\/td>\n<td>Datadog \/ Cloud Monitoring<\/td>\n<td>Job metrics, alerting<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>ITSM \/ incident mgmt<\/td>\n<td>Jira Service Management<\/td>\n<td>Track incidents, requests<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Project management<\/td>\n<td>Jira<\/td>\n<td>Work tracking, sprints, backlog<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Collaboration<\/td>\n<td>Slack \/ Microsoft Teams<\/td>\n<td>Stakeholder comms, incident channels<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Documentation<\/td>\n<td>Confluence \/ Notion<\/td>\n<td>Runbooks, definitions, process docs<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data catalog<\/td>\n<td>Alation \/ Collibra \/ DataHub<\/td>\n<td>Metadata, lineage, ownership<\/td>\n<td>Optional (Common in enterprise)<\/td>\n<\/tr>\n<tr>\n<td>Source control<\/td>\n<td>GitHub \/ GitLab<\/td>\n<td>PRs, code reviews, CI<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>CI\/CD<\/td>\n<td>GitHub Actions \/ GitLab CI<\/td>\n<td>Test runs, deployments for dbt\/models<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Security \/ IAM<\/td>\n<td>Okta \/ Azure AD<\/td>\n<td>Access control and identity<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Secrets mgmt<\/td>\n<td>AWS Secrets Manager \/ Vault<\/td>\n<td>Secure credentials<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Query IDE<\/td>\n<td>DataGrip \/ VS Code<\/td>\n<td>SQL development<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Notebooks<\/td>\n<td>Jupyter<\/td>\n<td>Exploration, audits, scripts<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Scripting<\/td>\n<td>Python<\/td>\n<td>Data checks, automation, APIs<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data formats<\/td>\n<td>Parquet \/ JSON \/ Avro<\/td>\n<td>Efficient storage and interchange<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Product analytics<\/td>\n<td>Segment<\/td>\n<td>Event collection and routing<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Product analytics<\/td>\n<td>Amplitude \/ Mixpanel<\/td>\n<td>Behavioral analytics, event validation<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>CRM<\/td>\n<td>Salesforce<\/td>\n<td>Revenue and customer data source<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Support systems<\/td>\n<td>Zendesk<\/td>\n<td>Support ticket data source<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Billing<\/td>\n<td>Stripe \/ Zuora<\/td>\n<td>Subscription and payment data source<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11) Typical Tech Stack \/ Environment<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Infrastructure environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud-first (AWS\/Azure\/GCP) is common, but some enterprises may run hybrid.<\/li>\n<li>Warehouse-centric analytics with a lake layer for raw or semi-structured data.<\/li>\n<li>IAM integrated with corporate identity provider for access control.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Application environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Product data originates from:<\/li>\n<li>Operational databases (Postgres\/MySQL), microservices stores<\/li>\n<li>Event streams from web\/mobile tracking or internal event buses<\/li>\n<li>SaaS systems (CRM, billing, marketing automation, support)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Data environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data pipeline pattern: ingestion \u2192 raw\/staging \u2192 modeled marts \u2192 semantic layer\/BI.<\/li>\n<li>Batch refresh is common (hourly\/daily), sometimes near-real-time for product metrics.<\/li>\n<li>Data quality framework: dbt tests + observability alerts + manual reconciliations for sensitive metrics.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data classification and access tiers (public\/internal\/confidential\/PII) depending on maturity.<\/li>\n<li>Audit trails for access and changes in regulated or enterprise contexts.<\/li>\n<li>Masking or tokenization patterns for sensitive identifiers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Delivery model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Agile (Scrum\/Kanban) within Data &amp; Analytics, often with service-style intake for requests.<\/li>\n<li>Production changes via PR review and CI where maturity is moderate-to-high.<\/li>\n<li>Release notes for data model changes affecting KPIs or dashboards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scale or complexity context<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moderate scale (typical for software companies): tens to hundreds of data sources, thousands of tables\/models.<\/li>\n<li>Complexity driven more by <strong>business logic<\/strong> and changing product instrumentation than raw volume alone.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team topology<\/h3>\n\n\n\n<p>Common team structure:\n&#8211; Data Platform \/ Data Engineering (infrastructure, ingestion, orchestration)\n&#8211; Analytics Engineering \/ BI Engineering (models, semantic layer, certified datasets)\n&#8211; Analysts embedded by function (Product, Finance, Marketing)\n&#8211; Data Specialist sits in the modeling\/quality enablement space, often bridging engineering and analytics.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12) Stakeholders and Collaboration Map<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Internal stakeholders<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Head of Data &amp; Analytics \/ Data &amp; Analytics Manager (manager):<\/strong> priorities, scope, performance expectations, escalation support.<\/li>\n<li><strong>Data Engineers:<\/strong> upstream ingestion, pipeline stability, source connectors, event streaming.<\/li>\n<li><strong>Analytics Engineers \/ BI Engineers:<\/strong> modeling standards, semantic layer, dashboard governance.<\/li>\n<li><strong>Data Analysts:<\/strong> day-to-day consumers; partner for requirements, testing assumptions, usability feedback.<\/li>\n<li><strong>Product Managers:<\/strong> instrumentation needs, KPI definitions, feature-change impact to metrics.<\/li>\n<li><strong>Software Engineers \/ QA:<\/strong> event tracking implementation, schema changes, release coordination.<\/li>\n<li><strong>Finance \/ RevOps:<\/strong> revenue metrics, billing reconciliation, month-end reporting integrity.<\/li>\n<li><strong>Marketing Ops \/ Sales Ops:<\/strong> funnel definitions, lead\/source attribution constraints, CRM field changes.<\/li>\n<li><strong>Security \/ Privacy \/ GRC:<\/strong> data classification, access control, retention policies, audit requirements.<\/li>\n<li><strong>Customer Success \/ Support Ops:<\/strong> customer health metrics, operational reporting, feedback loops.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">External stakeholders (as applicable)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Vendors for data tooling (warehouse, ETL, observability, BI)<\/li>\n<li>External auditors (in regulated environments)<\/li>\n<li>Implementation partners (during migrations)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Peer roles<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Analyst (functional)<\/li>\n<li>Analytics Engineer<\/li>\n<li>Data Engineer<\/li>\n<li>BI Developer<\/li>\n<li>Data Steward (in mature governance orgs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Upstream dependencies<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Instrumentation and event collection pipelines<\/li>\n<li>Source system owners (CRM, billing, support)<\/li>\n<li>Identity resolution logic and user\/account mapping rules<\/li>\n<li>Platform reliability (warehouse uptime, orchestration availability)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Downstream consumers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Executive dashboards and board reporting packs<\/li>\n<li>Product analytics and experimentation reporting<\/li>\n<li>Finance and revenue operations reporting<\/li>\n<li>Operational monitoring dashboards (support queues, customer health)<\/li>\n<li>Data science\/ML (when models rely on curated features)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Nature of collaboration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>High collaboration, frequent clarification loops, and shared ownership of definitions.<\/li>\n<li>Strong reliance on written artifacts (definitions, change notes, incident comms).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Typical decision-making authority<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Can decide implementation details for models\/tests\/docs within agreed standards.<\/li>\n<li>Co-decides metric definitions with business owners and analytics leadership.<\/li>\n<li>Escalates cross-domain conflicts (e.g., competing KPI definitions) to governance forums or leadership.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Escalation points<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data incidents impacting exec reporting \u2192 Data &amp; Analytics Manager \/ Head of Data.<\/li>\n<li>Disputes on KPI definitions \u2192 domain owner (Product\/Finance) + analytics leadership.<\/li>\n<li>Privacy\/security concerns \u2192 Privacy Officer \/ Security lead.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13) Decision Rights and Scope of Authority<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Can decide independently<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Implementation approach for SQL transformations and tests (within team standards).<\/li>\n<li>Dataset structure within an approved modeling pattern (staging\/intermediate\/mart).<\/li>\n<li>Triage steps for most data issues: investigation plan, immediate mitigations, communications draft.<\/li>\n<li>Documentation updates, data dictionary entries, and runbook content.<\/li>\n<li>Decommissioning low-usage non-certified assets (with notice) when within policy.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requires team approval (peer review \/ lead sign-off)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Changes to certified datasets and canonical metrics that affect multiple dashboards.<\/li>\n<li>Changes to shared macros, core dimensions (customer\/account\/user), or identity resolution logic.<\/li>\n<li>Backfills or reprocessing jobs that may materially affect historical reporting.<\/li>\n<li>Introducing new monitoring rules that may create alert noise or operational burden.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requires manager\/director\/executive approval<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>New tool adoption (data catalog, observability platform), vendor contracts, or licensing changes.<\/li>\n<li>Material changes to KPI definitions used in executive reporting.<\/li>\n<li>Major architectural changes (warehouse migration, orchestration replacement).<\/li>\n<li>Changes that meaningfully affect compliance posture (PII exposure, retention changes).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Budget, vendor, delivery, hiring, compliance authority<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Budget\/vendor:<\/strong> Typically no direct authority; may recommend tools based on evidence.<\/li>\n<li><strong>Delivery:<\/strong> Owns delivery of assigned domain data assets; negotiates prioritization with manager.<\/li>\n<li><strong>Hiring:<\/strong> May participate in interviews and technical assessments; typically not the final decision-maker.<\/li>\n<li><strong>Compliance:<\/strong> Executes governance controls; escalates and consults on policy interpretation.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14) Required Experience and Qualifications<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Typical years of experience<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>3\u20136 years<\/strong> in data analytics, analytics engineering, BI development, or data operations roles, with demonstrable ownership of production data assets.<\/li>\n<li>In smaller organizations, 2\u20134 years may be acceptable if experience is highly hands-on and end-to-end.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Education expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bachelor\u2019s degree in a relevant field (Computer Science, Information Systems, Statistics, Engineering, Economics) is common.<\/li>\n<li>Equivalent practical experience is often acceptable in software\/IT organizations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certifications (relevant, not mandatory)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud fundamentals (AWS\/Azure\/GCP) \u2014 <strong>Optional<\/strong><\/li>\n<li>Vendor warehouse certs (Snowflake\/BigQuery) \u2014 <strong>Optional<\/strong><\/li>\n<li>Data governance\/privacy training (internal or external) \u2014 <strong>Context-specific<\/strong><\/li>\n<li>dbt certification \u2014 <strong>Optional<\/strong> (useful signal where dbt is core)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prior role backgrounds commonly seen<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Analyst transitioning into production modeling and quality ownership<\/li>\n<li>Analytics Engineer \/ BI Engineer<\/li>\n<li>Data Operations \/ Reporting Specialist<\/li>\n<li>Junior Data Engineer with strong SQL and stakeholder-facing delivery<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Domain knowledge expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Strong understanding of SaaS\/product metrics (activation, retention, engagement) and\/or go-to-market metrics (pipeline, conversion, churn) depending on assigned domain.<\/li>\n<li>Practical understanding of how business processes map into systems (CRM, billing, support).<\/li>\n<li>Data privacy awareness and careful handling of identifiers and sensitive attributes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Leadership experience expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Not required as people management.<\/li>\n<li>Expected to lead small initiatives, coordinate stakeholders, and mentor informally.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15) Career Path and Progression<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Common feeder roles into Data Specialist<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Analyst (with strong SQL and ownership of complex reporting logic)<\/li>\n<li>BI Developer \/ Reporting Analyst<\/li>\n<li>Junior Analytics Engineer<\/li>\n<li>Data Operations Analyst<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Next likely roles after this role<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Senior Data Specialist<\/strong> (greater domain ownership, broader governance influence)<\/li>\n<li><strong>Analytics Engineer (Senior)<\/strong> (deeper modeling\/semantic layer leadership)<\/li>\n<li><strong>Data Quality\/Observability Specialist<\/strong> (focus on reliability engineering for data)<\/li>\n<li><strong>BI Engineering Lead<\/strong> (semantic and dashboard governance)<\/li>\n<li><strong>Data Product Manager<\/strong> (for those who excel in stakeholder alignment and productizing datasets)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Adjacent career paths<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data Engineering:<\/strong> move deeper into orchestration, ingestion, streaming, platform design.<\/li>\n<li><strong>Data Science\/ML:<\/strong> move into modeling if strong statistical\/programming skills are developed.<\/li>\n<li><strong>Governance\/Data Stewardship:<\/strong> specialize in cataloging, policy execution, and enterprise controls.<\/li>\n<li><strong>RevOps\/Finance Analytics:<\/strong> specialize in revenue data systems and reconciliations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Skills needed for promotion<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Demonstrated ownership of a complex domain with stable reliability outcomes.<\/li>\n<li>Ability to set and enforce metric definitions across multiple teams.<\/li>\n<li>Strong incident management and prevention track record (systemic improvements).<\/li>\n<li>Strong cost\/performance optimization outcomes at scale.<\/li>\n<li>Influence: successfully aligning Engineering\/Product\/Business around contracts and definitions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How this role evolves over time<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Early: executes and stabilizes models, fixes issues, improves docs.<\/li>\n<li>Mid: drives domain-level standardization and reliability program.<\/li>\n<li>Later: shapes governance patterns, semantic consistency, and scalable self-serve frameworks; leads multi-quarter initiatives.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16) Risks, Challenges, and Failure Modes<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Common role challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ambiguous definitions:<\/strong> \u201cActive user\u201d and \u201cchurn\u201d disputes that require governance and negotiation.<\/li>\n<li><strong>Upstream instability:<\/strong> schema drift, inconsistent event tracking, late-breaking product releases.<\/li>\n<li><strong>Tool sprawl:<\/strong> multiple BI tools and inconsistent semantic layers creating duplication.<\/li>\n<li><strong>High interrupt load:<\/strong> ad hoc requests and \u201cnumbers don\u2019t match\u201d escalations disrupting planned work.<\/li>\n<li><strong>Data access constraints:<\/strong> privacy policies limiting who can see what, complicating debugging and enablement.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Bottlenecks<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dependence on engineering teams for instrumentation fixes.<\/li>\n<li>Limited observability or test coverage leading to reactive firefighting.<\/li>\n<li>Lack of documented ownership causing slow decisions and repeated rework.<\/li>\n<li>Month-end reporting pressure collapsing priorities into urgent, unplanned work.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Anti-patterns<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Building dashboards directly on raw tables without curated models.<\/li>\n<li>Allowing every team to define KPIs independently (\u201cmetric anarchy\u201d).<\/li>\n<li>Manual, non-repeatable fixes (spreadsheets and one-off scripts) without upstream corrections.<\/li>\n<li>Over-testing trivial assets while under-testing executive-critical datasets.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common reasons for underperformance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Weak SQL and inability to reason about joins, grain, and aggregation.<\/li>\n<li>Poor communication leading to stakeholder distrust or repeated escalations.<\/li>\n<li>Lack of discipline around testing\/documentation; shipping logic that breaks later.<\/li>\n<li>Inability to prioritize; treating all requests as equal.<\/li>\n<li>Avoiding root cause and repeatedly patching symptoms.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Business risks if this role is ineffective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Executives make decisions on incorrect KPIs; strategic missteps.<\/li>\n<li>Revenue or finance reporting errors; potential audit\/compliance exposure.<\/li>\n<li>Increased operational cost due to rework and firefighting.<\/li>\n<li>Reduced product velocity due to lack of trustworthy telemetry and experimentation metrics.<\/li>\n<li>Low confidence in Data &amp; Analytics function, leading to shadow systems and fragmentation.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17) Role Variants<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">By company size<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Startup \/ small company:<\/strong> <\/li>\n<li>Broader scope: ingestion, modeling, dashboards, and governance are all combined.  <\/li>\n<li>Higher ambiguity, faster iteration, fewer formal controls.<\/li>\n<li><strong>Mid-sized software company:<\/strong> <\/li>\n<li>Clearer domain ownership, stronger testing\/CI habits, emerging governance and certified datasets.<\/li>\n<li><strong>Large enterprise IT organization:<\/strong> <\/li>\n<li>More formal governance, data cataloging, access approvals, and audit requirements.  <\/li>\n<li>The role may specialize: data quality specialist, data steward, or BI dataset owner.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">By industry<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>B2B SaaS:<\/strong> strong focus on product usage, retention, ARR movements, CRM\/billing alignment.<\/li>\n<li><strong>Consumer apps:<\/strong> heavier event analytics scale, experimentation metrics, near-real-time monitoring.<\/li>\n<li><strong>IT services \/ internal IT analytics:<\/strong> emphasis on ITSM data, operational KPIs, service reliability reporting.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">By geography<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Core responsibilities remain similar; variation primarily in:<\/li>\n<li>Privacy regulations and data residency expectations (more stringent controls in some jurisdictions)<\/li>\n<li>Working style and stakeholder cadence (distributed vs co-located)<\/li>\n<li>Best practice: document applicable privacy rules and data handling constraints explicitly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Product-led vs service-led company<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Product-led:<\/strong> event instrumentation, experimentation, and behavioral metrics are primary; high collaboration with Product\/Engineering.<\/li>\n<li><strong>Service-led:<\/strong> project reporting, utilization, and operational metrics more prominent; more structured reporting cycles.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup vs enterprise<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Startup:<\/strong> speed and pragmatic delivery; fewer tools; minimal governance.<\/li>\n<li><strong>Enterprise:<\/strong> strong compliance, cataloging, approvals; more formal change management and release processes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regulated vs non-regulated environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regulated:<\/strong> stronger PII controls, audit logs, retention policies, reconciliation rigor; more required documentation.<\/li>\n<li><strong>Non-regulated:<\/strong> more flexibility, but still expected to follow internal security standards and good practices.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18) AI \/ Automation Impact on the Role<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tasks that can be automated (increasingly)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>SQL drafting and refactoring assistance:<\/strong> AI copilots can accelerate writing transformations and improving readability.<\/li>\n<li><strong>Test generation suggestions:<\/strong> propose missing tests based on schema and historical failures.<\/li>\n<li><strong>Anomaly detection triage:<\/strong> summarize likely causes (schema drift, source outage, join explosion).<\/li>\n<li><strong>Documentation scaffolding:<\/strong> auto-generate dataset descriptions and lineage summaries (requires human review).<\/li>\n<li><strong>Support intake and categorization:<\/strong> route requests and suggest relevant datasets\/definitions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tasks that remain human-critical<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Metric definition governance:<\/strong> negotiating definitions requires context, tradeoffs, and stakeholder alignment.<\/li>\n<li><strong>Judgment on data correctness:<\/strong> AI can surface anomalies, but humans validate business reality.<\/li>\n<li><strong>Privacy\/security decisions:<\/strong> classification, minimization, and access patterns require accountability.<\/li>\n<li><strong>Root cause closure across teams:<\/strong> coordinating Engineering\/Product\/Business actions remains relationship-driven.<\/li>\n<li><strong>Designing durable models:<\/strong> understanding grain, lifecycle states, and business process nuance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How AI changes the role over the next 2\u20135 years<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Expectation shifts from \u201cwrite SQL quickly\u201d toward:<\/li>\n<li><strong>Designing systems of quality<\/strong> (contracts, tests, monitoring) that prevent issues<\/li>\n<li><strong>Curating semantics<\/strong> (metrics layer, certified datasets) for consistent decision-making<\/li>\n<li><strong>Operating data products<\/strong> with SLAs and clear ownership<\/li>\n<li>AI reduces time spent on first drafts and repetitive diagnostics, increasing emphasis on:<\/li>\n<li>Review, validation, and governance discipline<\/li>\n<li>Stakeholder management and data literacy enablement<\/li>\n<li>Higher throughput with consistent quality<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">New expectations caused by AI, automation, or platform shifts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ability to evaluate AI-generated code for correctness, performance, and security.<\/li>\n<li>Stronger emphasis on metadata quality (catalog completeness, lineage accuracy) to power automation.<\/li>\n<li>More proactive monitoring and automated remediation patterns (self-healing pipelines where feasible).<\/li>\n<li>Greater need for clear metric contracts and change management as more users self-serve.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19) Hiring Evaluation Criteria<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to assess in interviews<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>SQL depth and correctness<\/strong>\n   &#8211; Joins and grain management, deduplication, window functions, incremental logic.<\/li>\n<li><strong>Data modeling judgment<\/strong>\n   &#8211; How they design marts for analytics use cases; handling slowly changing dimensions, event data, and aggregation pitfalls.<\/li>\n<li><strong>Data quality mindset<\/strong>\n   &#8211; Testing strategy, reconciliation approaches, and monitoring\/alerting patterns.<\/li>\n<li><strong>Stakeholder communication<\/strong>\n   &#8211; Ability to explain discrepancies, document definitions, and manage expectations.<\/li>\n<li><strong>Operational reliability<\/strong>\n   &#8211; Incident response experience, root cause analysis, preventing recurrence.<\/li>\n<li><strong>Pragmatic governance<\/strong>\n   &#8211; Handling PII, access discipline, and balancing usability with risk.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Practical exercises or case studies (recommended)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>SQL + modeling exercise (60\u201390 minutes)<\/strong>\n   &#8211; Provide sample tables: users, accounts, events, subscriptions.\n   &#8211; Ask candidate to:<ul>\n<li>Build a clean \u201cactive users\u201d dataset with a defined grain<\/li>\n<li>Define activation and retention metrics<\/li>\n<li>Identify and handle duplicates and late-arriving events<\/li>\n<li>Propose 5\u20138 data tests<\/li>\n<\/ul>\n<\/li>\n<li><strong>Data discrepancy triage scenario (30 minutes)<\/strong>\n   &#8211; \u201cDashboard shows churn spiking 30% overnight; finance disagrees.\u201d\n   &#8211; Candidate explains investigation steps, communications, and likely causes.<\/li>\n<li><strong>Definition alignment mini-case (30 minutes)<\/strong>\n   &#8211; Conflicting definitions of \u201cnew customer\u201d between Sales and Product.\n   &#8211; Candidate proposes governance approach and a canonical definition with edge cases.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Strong candidate signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Explains grain and aggregation clearly; proactively prevents double counting.<\/li>\n<li>Treats tests\/documentation as part of \u201cdone,\u201d not optional.<\/li>\n<li>Demonstrates structured debugging: isolate source, reproduce, validate, fix, prevent.<\/li>\n<li>Can communicate tradeoffs and uncertainties honestly (e.g., \u201cThis metric is directionally correct but incomplete due to X\u201d).<\/li>\n<li>Understands how product instrumentation decisions affect analytics.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Weak candidate signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Writes SQL that \u201cworks\u201d but cannot explain why it is correct.<\/li>\n<li>Avoids definitions work; defaults to \u201cjust build the dashboard.\u201d<\/li>\n<li>Doesn\u2019t consider downstream impact or change management.<\/li>\n<li>Over-focuses on tools rather than principles (cannot adapt across stacks).<\/li>\n<li>Confuses reporting with modeling; builds business logic into BI layers inconsistently.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Red flags<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dismisses data governance\/privacy concerns or treats PII casually.<\/li>\n<li>Blames stakeholders for confusion without addressing definition\/documentation gaps.<\/li>\n<li>No concept of tests, monitoring, or preventing recurrence.<\/li>\n<li>Frequent reliance on manual spreadsheet fixes as the default solution.<\/li>\n<li>Poor collaboration behaviors: defensive, opaque, or unwilling to document.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scorecard dimensions<\/h3>\n\n\n\n<p>Use a consistent rubric (e.g., 1\u20135) across interviewers:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Dimension<\/th>\n<th>What \u201cmeets bar\u201d looks like<\/th>\n<th style=\"text-align: right;\">Weight (example)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SQL &amp; data transformations<\/td>\n<td>Correct joins, grain clarity, readable maintainable SQL<\/td>\n<td style=\"text-align: right;\">20%<\/td>\n<\/tr>\n<tr>\n<td>Data modeling<\/td>\n<td>Designs marts that support stable metrics; understands dimensions\/facts<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<\/tr>\n<tr>\n<td>Data quality &amp; observability<\/td>\n<td>Proposes practical tests and monitoring; balances signal\/noise<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<\/tr>\n<tr>\n<td>Incident triage &amp; RCA<\/td>\n<td>Structured debugging and prevention mindset<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<\/tr>\n<tr>\n<td>BI\/semantic understanding<\/td>\n<td>Understands metric layers and dashboard failure modes<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<\/tr>\n<tr>\n<td>Stakeholder communication<\/td>\n<td>Clear, concise explanations; expectation management<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<\/tr>\n<tr>\n<td>Governance &amp; privacy discipline<\/td>\n<td>Sensitivity awareness and access control thinking<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<\/tr>\n<tr>\n<td>Ownership &amp; collaboration<\/td>\n<td>Drives closure, works well cross-functionally<\/td>\n<td style=\"text-align: right;\">5%<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20) Final Role Scorecard Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Summary<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Role title<\/td>\n<td>Data Specialist<\/td>\n<\/tr>\n<tr>\n<td>Role purpose<\/td>\n<td>Deliver and maintain trusted datasets, metric definitions, and data quality controls that enable reliable analytics and reporting across the organization.<\/td>\n<\/tr>\n<tr>\n<td>Top 10 responsibilities<\/td>\n<td>1) Own domain data readiness 2) Build curated datasets\/marts 3) Maintain canonical metrics 4) Implement data tests 5) Monitor freshness\/quality 6) Triage and resolve discrepancies 7) Coordinate upstream tracking changes 8) Manage backfills\/reprocessing 9) Document datasets\/definitions\/lineage 10) Improve performance and reduce cost<\/td>\n<\/tr>\n<tr>\n<td>Top 10 technical skills<\/td>\n<td>1) Advanced SQL 2) Data modeling (dimensional) 3) Data validation\/testing 4) ETL\/ELT operations 5) Warehouse fundamentals 6) Git\/PR workflow 7) BI semantics basics 8) Python scripting 9) Incremental processing patterns 10) Reconciliation techniques<\/td>\n<\/tr>\n<tr>\n<td>Top 10 soft skills<\/td>\n<td>1) Analytical rigor 2) Clear communication 3) Prioritization 4) Ownership mindset 5) Structured problem solving 6) Collaboration\/influence 7) Attention to detail with pragmatism 8) Documentation discipline 9) Stakeholder empathy 10) Calm execution under incident pressure<\/td>\n<\/tr>\n<tr>\n<td>Top tools or platforms<\/td>\n<td>Snowflake\/BigQuery\/Redshift (context), dbt, GitHub\/GitLab, Jira, Slack\/Teams, Looker\/Tableau\/Power BI (context), Airflow\/Dagster (context), Confluence\/Notion, data catalog (Alation\/Collibra\/DataHub), observability tools (optional)<\/td>\n<\/tr>\n<tr>\n<td>Top KPIs<\/td>\n<td>Freshness SLA attainment, test coverage on critical assets, incident count (P1\/P2), MTTD\/MTTR, reconciliation pass rate, stakeholder satisfaction, certified metric adoption rate, request cycle time, query cost\/performance improvements, documentation completeness<\/td>\n<\/tr>\n<tr>\n<td>Main deliverables<\/td>\n<td>Curated marts and views, metric definitions, automated tests and monitoring, certified BI datasets\/semantic models, incident runbooks\/postmortems, catalog entries\/data dictionary, backfill plans, performance optimization changes<\/td>\n<\/tr>\n<tr>\n<td>Main goals<\/td>\n<td>30\/60\/90-day onboarding to domain ownership; 6\u201312 months to stable SLAs, reduced incidents, standardized metrics, and scalable self-serve enablement<\/td>\n<\/tr>\n<tr>\n<td>Career progression options<\/td>\n<td>Senior Data Specialist; Analytics Engineer (Senior); Data Quality\/Observability Specialist; BI Engineering Lead; Data Product Manager; pathway to Data Engineering or Governance specialization depending on strengths<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>A **Data Specialist** is a hands-on data professional responsible for ensuring that an organization\u2019s data is **accurate, well-structured, accessible, and usable** for analytics, operational reporting, and downstream data products. The role blends practical data engineering fundamentals (ingestion, transformation, validation) with analytics enablement (semantic definitions, metrics consistency, reporting readiness) and data governance execution (quality controls, documentation, access patterns).<\/p>\n","protected":false},"author":61,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_joinchat":[],"footnotes":""},"categories":[6516,24508],"tags":[],"class_list":["post-75053","post","type-post","status-publish","format-standard","hentry","category-data-analytics","category-specialist"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/75053","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/61"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=75053"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/75053\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=75053"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=75053"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=75053"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}