Small Teams, Big Expectations: The State of Data Engineering in AI

Becoming an AI-fueled organization is built on data, making data engineering a critical discipline. Data engineering encompasses a set of best practices and technologies for developing engineered data workflows and pipelines that connect operational and analytical data infrastructures. This includes data orchestration, integration and transformation, ensuring that data is easy to consume across the organization.

However, as the authors of "Rewired" highlight, accessing and preparing data remains a major challenge for many established companies, with up to 70% of AI development efforts spent wrangling and harmonizing data.

Effective data engineering starts by identifying the right data for digital solutions. This means prioritizing data domains — groups of related data that support governance and architecture — and developing reusable data products that help teams address specific business challenges. To ensure this data is reliable and actionable, data engineers must assess data readiness across key factors such as:

Accuracy
Timeliness
Consistency
Security

With this in hand, data architecture acts as a system of pipes, directing the flow of data from sources to ingestion, storage, processing and consumption.

Organizations must next consider modern architectural frameworks like cloud-native data lakes, warehouses, lakehouse models, data mesh and data fabric to manage data effectively. As AI adoption scales, staying informed about the latest advancements in data engineering is essential. So, what does the latest research reveal about the state of data engineering? Let’s explore the findings.

Data Engineering Is a Boardroom Priority

Recent research underscores the growing importance of data engineering, with over 90% of organizations recognizing its significance. Notably, 80% classify it as critical or very important — a marked increase from 2023 and 2024. This upward trend highlights the expanding role of data engineering in driving business success.

The perceived criticality of data engineering is especially pronounced in industries with mature analytics capabilities, such as manufacturing, healthcare, financial services and technology. Furthermore, organizations that have been most successful in leveraging data see it as even more essential.

Within the data science function, the consensus is unanimous — 100% consider data engineering critical, reinforcing its foundational role in enabling advanced analytics and AI-driven insights. This is something CEOs should take note of.

How Data Engineering Delivers Value

Data engineering plays a crucial role in modern organizations, supporting a wide range of data workflows essential for analytics and operations. It underpins data integration, cleansing and transformation processes for data warehouses while ensuring smooth data flows between operational systems. Additionally, data engineering enables ad hoc queries, discovery and exploratory analysis by providing robust integration and transformation services. Managing and delivering master data is another key function, ensuring consistency and accuracy across enterprise systems.

Beyond foundational data management, data engineering is critical for advanced analytics use cases, including data science, augmented analytics and predictive and prescriptive modeling. It also facilitates re-platforming or replication of existing data warehouses and supports seamless data migration to new systems — whether due to cloud adoption, system consolidation or modernization efforts. Furthermore, data engineering enables external data sharing, including the extraction and delivery of data to systems or third parties.

Organizations increasingly incorporate third-party data enrichment into their data-engineering workflows, enhancing the value of their internal data for more comprehensive insights and decision-making.

According to Bill Hostmann, VP and research fellow at Dresner Advisory Services, the predominant use case driving data-engineering investments remains ”as part of the data integration, cleansing and transformation workflows for a data warehouse supporting dashboards and reporting,” but a secondary and also important use case is ”data integration and transformation services for ad hoc query, discovery and exploration analysis.”

What Today’s Data Engineers Are Really Tasked With

The top jobs for data engineers revolve around efficient data aggregation, grouping and ETL/ELT workflows, alongside robust management of engineering processes. Key requirements include execution planning, job monitoring, alerting and time- or event-based scheduling to ensure reliable data pipelines. Equally important is workflow creation through no-code data transformations, graphical drag-and-drop design tools and script-based automation. In-memory engines for real-time data exploration, automated data quality rules, and AI-driven recommendations for data relationships further enhance data engineer efficiency and insight generation.

Comprehensive data management is also a vital part of the role. These include metadata capabilities, data profiling, governance tools with audit trails and lineage tracking and mechanisms to mask or redact sensitive data.

To be clear, a rich library of prebuilt components can streamline data integration, while connectors facilitate seamless access to various data sources and event-driven architectures. Additionally, debugging tools for testing and optimizing data workflows ensure accuracy and performance. Though less critical, support for Kafka and Apache big data services ranks lower in priority compared to these core engineering needs.

Data Engineers Face an Explosion of Sources

Data engineers manage a diverse range of data sources and targets, ensuring seamless integration across various platforms. By volume, relational databases like Oracle and SQL Server remain dominant, serving as critical sources and destinations for structured data. File-based formats, including Excel, CSV, log files and JSON, are also widely used, especially for data exchange and interim storage. Enterprise applications such as Salesforce, Workday, Oracle, SAP and Infor contribute substantial data flows, requiring specialized connectors and integration strategies.

Beyond traditional systems, modern data engineering encompasses object stores like Amazon S3 for scalable storage, analytical databases such as Snowflake and Exasol for high-performance querying and NoSQL platforms like MongoDB and Couchbase for flexible, schema-less data management.

Specialty data platforms, including SAP HANA and Palantir, add further complexity, as do Hadoop-based ecosystems like Cloudera. Emerging technologies such as graph databases, including Neo4J and TigerGraph, play an increasing role in relationship-driven analytics, expanding the breadth of data engineering responsibilities.

Related Article: Data Scientists Share Their AI Use Cases

AI Ambitions Outpace Team Capacity

Data engineering team sizes typically range from 0 to 4 engineers, with the most effective organizations tending to have more rather than fewer. Effectiveness generally increases with organizational size, as larger companies have greater resources and specialized talent to support data engineering efforts. However, once an organization surpasses 10,000 employees, effectiveness tends to decline due to the growing complexity of data sources, diverse use cases and the challenges of managing on-premises, cloud and hybrid environments.

Smaller organizations often rely on scripts, spreadsheets and self-service data preparation tools due to limited staff and expertise in data engineering. While these approaches may be sufficient for basic needs, they lack the scalability and robustness required for advanced analytics.

At the other extreme, very large organizations face operational inefficiencies stemming from fragmented data landscapes and the sheer scale of their data engineering requirements, making it harder to maintain peak effectiveness.

No Engineering, No AI: What the Research Makes Clear

As organizations strive to become AI-driven, data engineering emerges as a foundational discipline that ensures data is accessible, reliable and ready for advanced analytics. Research confirms its growing importance, with the most successful organizations recognizing it as critical to their digital transformation efforts. Industries with mature analytics capabilities — such as manufacturing, healthcare, financial services and technology — are leading the way in prioritizing data engineering investments. However, challenges persist, particularly around data integration, governance and scalability as organizations grow.

For data leaders, the message is clear: investing in data engineering is not optional but essential for AI success. Building scalable data workflows, leveraging modern architectural frameworks and ensuring data quality and governance will determine an organization’s ability to extract value from AI and analytics.

Learning Opportunities

Webinar

Sep

CMS Briefing: A Live Look at What’s Next in AI-Driven Platforms

Learn how leading organizations are using AI‑driven tools to publish faster, personalize smarter and stay secure.

Webinar

Oct

Agentic AI Playbook Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Webinar

On demand

Ready or Not: How Data-First Organizations Are Unlocking Agentforce Potential

Learn how to cut through the noise, activate Agentforce and build a Salesforce AI strategy that actually delivers.

Watch Now

Webinar

On demand

AI in Customer Service: Faster Resolutions, Happier Customers

Don’t let rising demand burn out your team. See how to build a smarter, more resilient support org.

Watch Now

Webinar

On demand

From Hype to High-Impact CX Strategies That Actually Scale

Turn buzzworthy AI and outsourcing trends into measurable CX wins with fresh 2025 data.

Watch Now

Webinar

On demand

Insights to Action Rethinking the Contact Center for Real Business Impact

Join our exclusive webinar to hear CX executives share their innovative strategies for transforming service delivery.

Watch Now

Webinar

Sep

CMS Briefing: A Live Look at What’s Next in AI-Driven Platforms

Learn how leading organizations are using AI‑driven tools to publish faster, personalize smarter and stay secure.

Webinar

Oct

Agentic AI Playbook Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Webinar

On demand

Ready or Not: How Data-First Organizations Are Unlocking Agentforce Potential

Learn how to cut through the noise, activate Agentforce and build a Salesforce AI strategy that actually delivers.

Watch Now

As team sizes remain small, optimizing data engineering efficiency through automation, reusable data products and self-service capabilities become increasingly important. In a world where data fuels innovation, the organizations that master data engineering will be best positioned to lead in the AI era.

fa-solid fa-hand-paper Learn how you can join our contributor community.