In a recent podcast episode, Cody David, a solutions architect at Syniti, part of Capgemini, discussed the critical role of data quality in supporting artificial intelligence (AI) workloads. He emphasizes that trust is foundational for AI applications, asserting that reliable outcomes can only emerge from datasets free of duplicates and incomplete information.
Understanding the Challenges of Data Quality in AI
David identifies trust as a significant hurdle in achieving data quality for AI. Many users perceive AI systems as opaque, often attributing incorrect insights or actions to the AI itself, leading to a lasting loss of confidence. However, the underlying problem typically stems from subpar data quality, further complicated by a general lack of understanding regarding AI mechanisms.
For instance, in a sales organization utilizing a CRM, duplicate customer records can skew AI-driven analyses, resulting in inaccurate customer rankings. The sales team may blame the AI tool without recognizing that the unsatisfactory performance is rooted in poor data management. This illustrates the necessity for quality data to enable effective AI processes.
Conversely, AI can facilitate data quality improvement by detecting and merging duplicate records. Historically, organizations have often overlooked data quality, launching AI initiatives without a foundational data-first approach. Legacy systems, especially legacy ERP systems, complicate matters further due to decades of accumulated data issues, making proactive data quality management essential before embarking on AI projects.
Steps for Ensuring Data Quality for AI
To cultivate robust data quality, David advocates for a systematic approach beginning with data governance. Establishing clear policies for data collection, storage, cleansing, and sharing is paramount, alongside identifying accountable individuals responsible for these processes.
Next, organizations should prioritize efforts based on potential business impact rather than attempting to resolve all issues simultaneously. Identifying data quality areas that enhance AI outcomes will likely yield quick gains. Despite budgetary concerns, it’s worth noting that managing flawed data ultimately incurs greater costs. A pragmatic starting point would be to target a crucial business process with measurable financial implications to pilot data quality improvements and demonstrate return on investment.
Integrating validation rules into data management workflows can help catch errors promptly, preventing their influence on AI solutions. If real-time validation isn’t feasible, implementing systems for immediate error detection through automated reporting becomes vital. Continuous measurement of data quality metrics is essential, allowing organizations to drive iterative improvements and embed data governance throughout their operations.
Quick Wins for Improving Data Quality in Enterprises
David highlights several quick wins in terms of data quality improvements beneficial for AI outcomes. In an ERP context, he mentions MRO materials, which are essential for equipment repair in manufacturing. Keeping an accurate inventory of these materials is crucial, as having unnecessary duplicates can tie up working capital that could otherwise be allocated to other initiatives.
Another example pertains to vendor discounts. Duplicate vendor entries in a system could lead to unrealized rebates on spending, representing a significant cost-saving opportunity. Addressing such data quality challenges not only enhances operational efficiency but also contributes to the financial health of the organization.
By focusing on these pragmatic steps and leveraging AI for data quality enhancement, organizations can establish a solid foundation for reliable AI applications.