In Case You Missed It: Ample Insight’s CEO Garry Ma, delivered a webinar in partnership with Communitech, focusing on the critical role of data quality in building trusted AI. With his extensive background as an early data scientist at Facebook (Meta), co-founder of an AI recruiting solution, and current AI business consultant, Garry is uniquely qualified to discuss this topic. Let’s dive into the details:

‍

Why Data Quality Matters

As AI continues to grow exponentially, its foundation -data- becomes increasingly important. AI’s outputs are only as good as its inputs, making data quality a vital consideration for organizations. The figures below illustrate the significant potential for improvement in many organizations.

‍

Dimensions of Data Quality

The terms Validity, Consistency, Completeness, Timeliness, and Uniqueness all define data quality. Let’s dive deeper into these dimensions to develop a comprehensive idea of data quality.

‍

Validity: Data must conform to the syntax (format, type, range) of its definition. For example, date fields should only contain valid dates, not text or numbers.
Consistency: Data should align across datasets without contradictions. For example, addresses between different database tables should have the same format.
Completeness: Data should include all necessary parts, with no missing values. For example, required contact information for a customer database should not be missing fields.
Timeliness: Data must be available when needed and reflect the current state. For example, sales data in a data warehouse should not be days behind.
Uniqueness: Data should not contain duplicate entries. For example, each customer should have a unique identifier to prevent duplicates in the system.

‍

Strategies for Data Quality

Ensuring data quality is a collective responsibility that requires alignment among all data stakeholders. Here are five strategies to enhance your data quality:

Create Data Governance Standards: Establish clear guidelines and policies for data management.
Profile and Clean Data: Regularly examine and cleanse data to maintain its quality.
Implement Data Validation and Monitoring: Continuously validate and monitor data to ensure data accuracy and integrity.
Train Staff on Latest Data Workflows & Policies: Educate stakeholders on principles, policies and their role in maintaining data quality.
Make Feedback Easy & Continuously Improve: Foster a culture of continuous improvement and regularly gather input and updating processes to enhance data quality.

‍

Landscape of Data Tools

Using the right tools is essential for improving data health. When assessing your data health, consider the following aspects and their associated features:

Data Quality

Data profiling
Data issue detection
Data issue monitoring

Data Observability

Data discovery & cataloguing
Data lineage mapping
Data pipeline monitoring

Data Cataloguing

Centralized inventory of data assets
Metadata management
Data discovery

Data Fabric

Unified data management & access
Data observability and cataloguing

‍

Building Trusted AI

“Building good AI is like preparing a nutritious meal.”

Several principles have been defined by various governments and organization for building responsible AI, however, the core principles are:

Transparent, Explainable, and Accountable
Safe and Sensible for Human, Society and Environment
Fair and Human-centred Values

‍

Industry concerns have emerged about the ethical operationalization of generative AI. While 83% of IT leaders believe companies must collaborate to ensure the ethical use of generative AI, only 30% feel ethical use guidelines are a requirement for successful implementation.

‍

Conclusion

This webinar demonstrated that data quality is at the cornerstone of trusted AI. It’s important to remember that investing in your data quality is a continuous process, and it is never too late to start.

Be mindful and create Good AI.

‍

Additional Resources

Watch the Good AI Webinar recording here.

For additional information about data quality and building trusted AI, check out these supplementary materials:

‍

References