What is AI governance?

AI governance is the application of rules, processes, and responsibilities to drive maximum value from your automated data products by ensuring applicable, streamlined, and ethical AI practices that mitigate risk, adhere to legal requirements, and protect privacy.

What is artificial intelligence (AI)?

Artificial intelligence (AI) is the simulation of human intelligence in machines that are programmed to think and learn like humans. AI can take on tasks that were previously done by people, including problem solving repetitive tasks.

What is an AI use case?

An AI use case refers to how AI models will be used to solve a specific problem. Examples include chatbots, fraud detection, and personalized product recommendations.

What is the AI Verify Foundation?

The AI Verify Foundation is a member-based foundation that aims to harness the collective power and contributions of the global open-source community to develop AI testing tools to enable responsible AI. The Foundation promotes best practices and standards for AI.

What is artificial general intelligence (AGI)?

Artificial general intelligence (AGI) refers to AI systems with human-level cognitive abilities, capable of understanding, learning, and applying knowledge across various domains autonomously.

What is accountability in AI?

The obligation for individuals and organizations to take ownership of their data practices and AI decisions, ensuring they are answerable for the outcomes and impacts those systems create.

What is an algorithm?

An algorithm is a methodology or collection of directives and regulations crafted to execute a defined task or address a specific issue, employing computational resources.

What is an audit in AI?

An audit in AI is an official, independent examination and verification to ensure that AI and the data that drives it meet specific criteria set forth by governing or regulatory entities.

Bias in AI is the presence of discriminatory outcomes/outputs. Bias can arise from various sources including incomplete data, improper use-case, or flaws in the AI algorithm.

A chatbot is a program designed to simulate human conversation through text or voice interactions, often in use cases involving customer service.

What is classification in AI?

Classification in AI is a field where AI models utilize supervised learning to bucket data, images, or other data inputs into a predefined number of classes/categories.

What is computer vision?

Computer vision is a field of AI focused on enabling computers to interpret visual data from images or videos.

What is a corpus in AI?

A corpus is a vast dataset utilized by AI models to discern patterns, forecast outcomes, or generate specific results. It may comprise some combination of structured and unstructured data, addressing singular or diverse subjects.

What is data quality?

Data quality determines if data is fit for use to deliver high quality AI model outputs. Data is considered high quality based on consistency, uniqueness, completeness, and validity. High quality data must have standardized data entries, data sets that do not contain repeated or irrelevant entries, be representative of real-world conditions, and conform to the formatting required by the business.

What is the Data & Trust Alliance?What is the Data & Trust Alliance?

The Data & Trust Alliance is a not-for-profit member organization that brings together leading businesses and institutions across multiple industries to learn, develop, and adopt responsible data and AI practices.

What is DataOps (Data Operations)?

DataOps (Data Operations) is the practice of managing, organizing, and manipulating data to extract meaningful insights and support decision-making. Within AI, DataOps is used to manage and support the full lifecycle of AI development and deployment.

Deepfakes are synthetic media created using AI, manipulating videos or images to replace one person's likeness with another, often raising concerns about misinformation and deception.

Ethical AI refers to the guidelines and principles that govern the ethical and moral implications of AI development and deployment. Ethical AI ensures AI benefits while minimizing potential risks and ethical dilemmas.

What is Executive Order 14110 (US White House)?

Executive Order 14110 is an Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. It aims to strengthen AI practices and mitigate risks across the US federal government and private industry, establish standard practices, and enhance information sharing.

What is explainability in AI?

Explainability in AI refers to the ability to detail outputs of AI models, ensuring they are understood, to the best of their ability, by technical and non-technical audiences alike.

What is the EU AI Act?

The EU AI Act is European Union law that provides AI developers with clear requirements and obligations regarding specific uses of AI, helping them build trustworthy AI use cases while mitigating risks to EU citizens and organizations. The Act was passed on March 13, 2024.

What is fairness in AI?

Fairness in AI means ensuring unbiased, equitable, and accurate outcomes across diverse groups and individuals, mitigating bias and discrimination.

What is feature selection?

Feature selection is the process for determining the most informative attributes of a dataset for use in a machine learning model.

What is fine-tuning in AI?

Fine-tuning in AI is the process of adjusting and optimizing a pre-trained large language model on a specific dataset, set of questions & answers, or task to improve its performance. It involves tuning the model outputs with the new data, allowing it to adapt and specialize for the desired task or domain.

What is a foundational model?

A foundational model is a core large language model (LLM) framework, trained on large and diverse data sets, used as the basis for developing more complex or specialized models in machine learning (ML) and artificial intelligence (AI).

What is generative AI (GenAI)?

Generative AI (GenAI) is artificial intelligence capable of producing previously unseen text, images, or other media in response to user prompts.

What are hallucinations in AI?

Hallucinations in AI refer to misleading content created by generative AI that conflicts with source material or generates factually inaccurate output while maintaining the appearance of accuracy.

What is inference in AI?

Inference in AI is the process of developing insights or conclusions from available information by analyzing patterns and relationships within data.

What is a jailbreak in AI?

A jailbreak in AI is a prompt intentionally designed to avoid ethical guidelines or restrictions within a large language model (LLM).

What is a large language model (LLM)?

A large language model (LLM) is a specific type of artificial AI model that has been trained on large amounts of text to interpret, understand, and generate human-like language outputs. Examples include OpenAI ChatGPT and Google Gemini.

What is machine learning?

Machine learning is a subset of AI where algorithms learn patterns and make predictions from data without explicit programming, improving performance over time.

ModelOps is a collection of tools, technologies, and best practices to build, deploy, monitor, and manage machine learning and AI models. It is the key capability for scaling and governing AI at the enterprise level.

MLOps stands for Machine Learning Operations. MLOps is a core function of Machine Learning Engineering, focused on streamlining the process of taking machine learning and AI models from ideation to production, and then maintaining and monitoring them. End-to-end MLOps systems also typically incorporate data layer connectivity and feature curation. MLOps is a collaborative function, often comprising data scientists, devops engineers, and IT.

What is natural language processing (NLP)?

Natural language processing (NLP) is a field of AI where models understand, interpret, and generate human language.

What is a neural network?

A neural network is a model inspired by the human brain, comprising interconnected nodes (neurons) that process information to perform tasks like pattern recognition, text generation, or prediction. Not included in this glossary are individual components of a neural network, such as activation functions, layer types, and cost functions.

What is the National Institute of Standards and Technology (NIST)?

The National Institute of Standards and Technology (NIST) is a US Government agency dedicated to promoting U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life. NIST has developed an AI Risk Framework to help organizations better manage AI risks.

What is privacy in AI?

Privacy in AI refers to ensuring data used in AI is handled in accordance with all applicable data privacy laws and regulations to prevent unauthorized access, use, or disclosure of sensitive information.

What is a prompt in AI?

A prompt in AI is the input provided to a model or system to initiate a specific task or query. It can take various forms, such as a question, a statement, or a command, and is designed to elicit a desired response from the AI system.

What is prompt engineering?

Prompt engineering is a technique used in artificial intelligence, especially in natural language processing, where the user optimizes a prompt to a Generative AI model via commands, examples, or other techniques, to elicit a desired response.

What is regression in AI?

Regression in AI is a field where models utilize supervised learning to predict a continuous outcome from data, images, or other data inputs.

What is regulatory compliance in AI?

Regulatory compliance in AI ensures that AI models adhere to relevant laws and regulations including data protection laws, anti-discrimination laws, and industry-specific regulations like HIPAA.

What is the Responsible AI Institute?

The Responsible AI Institute is a global and member-driven non-profit dedicated to enabling successful responsible AI efforts in organizations.

What is retrieval augmented generation (RAG)?

Retrieval augmented generation (RAG) is the combination of a retrieval-based method with an LLM, where supporting documents are embedded into a vector store, retrieved when a user queries the model, and included as additional context during prompting to an LLM.

What is reliability in AI?

Reliability in AI refers to the ability of AI models to consistently produce accurate and trustworthy results, without unexpected errors or biases, across various tasks and conditions.

What is security in AI?

Security in AI involves implementing measures to protect AI models from internal and external threats as well as ensuring the confidentiality of the data used by AI models.

What is the Standards Council of Canada (SCC)?

The Standards Council of Canada (SCC) is a Canadian entity focused on the standardization of guidelines, processes, and characteristics in a wide range of fields, including AI. In 2023, the SCC developed the AI and Data Governance Standardization Collaborative (AIDG) to expand on data governance standardization and include new AI issues.

What is synthetic data?

Synthetic data is artificial data that mimics real data which is often used for training machine learning (ML) models when there isn't sufficient real data or it is highly sensitive.

What is training data in AI?

Training data in AI is a portion of a dataset that is used to train an AI model until it can proficiently predict results or uncover patterns.

What is test data in AI?

Test data in AI is a portion of a dataset that is not used to train an AI model but rather measure performance of an AI model after training is completed.

What is transparency in AI?

Transparency in AI ensures that AI models, including algorithms and decision-making processes, are open and understandable to relevant stakeholders, such as developers, data scientists, business users, legal and privacy professionals.

What is trusted data?

Trusted data is high-quality, secure, understood, accurate, and easily accessible by the proper individuals to confidently create and deploy valuable, ethical, responsible, and compliant AI models.

Artificial intelligence (AI) governance glossary

In today's landscape, data serves as the fuel and AI as the engine for innovation and competitive advantage, driving organizations to leverage their information as a strategic asset to unlock insights, automate processes, and accelerate innovation. However, without a unified approach to governing both data and AI, significant risks emerge, including inaccurate insights, pervasive data quality issues, inherent model bias, and critical legal and ethical implications that erode trust; therefore, to innovate both swiftly and safely, a shared understanding is paramount, and this glossary provides a comprehensive vocabulary for the interconnected worlds of data and AI, equipping you with the essential terms and phrases needed to cultivate a trusted, governed, and high-performing data and AI practice.

A

: AI governance is the application of rules, processes, and responsibilities to drive maximum value from your automated data products by ensuring applicable, streamlined, and ethical AI practices that mitigate risk, adhere to legal requirements, and protect privacy.
Learn more about AI governance
: The simulation of human intelligence in machines that are programmed to think and learn like humans. AI can take on tasks that were previously done by people including problem solving repetitive tasks.
: How AI models will be used to solve a specific problem. Examples include chatbots, fraud detection, and personalized product recommendations.
: A member-based foundation that aims to harness the collective power and contributions of the global open-source community to develop AI testing tools to enable responsible AI. The Foundation promotes best practices and standards for AI.
Learn more about AI Verify Foundation
: Refers to AI systems with human-level cognitive abilities, capable of understanding, learning, and applying knowledge across various domains autonomously.
: The obligation for individuals and organizations to take ownership of their data practices and AI decisions, ensuring they are answerable for the outcomes and impacts those systems create.
: A methodology or collection of directives and regulations crafted to execute a defined task or address a specific issue, employing computational resources.
: An official, independent examination and verification to ensure that AI and the data that drives it meet specific criteria set forth by governing or regulatory entities.
: The principle of hiding the underlying technical complexity (e.g., table joins, cryptic column names) from the end-user.
: Metadata that is continuously collected, analyzed, and used to automate intelligence and workflow actions across data systems to improve data discovery, quality and governance.
: An autonomous AI system that can proactively perceive its environment, make decisions, and take actions to achieve specific goals (e.g., an agent that automatically detects and remediates a data quality issue).
Learn more about Agentic AI
: The degradation of a model's predictive performance over time, often because the new, real-world data it sees no longer matches the data it was trained on.
: Methods such as SHAP and LIME that help explain model predictions by tracing them back to specific features and training data with documented lineage.
: Policies, workflows and controls for managing the development, validation, approval, deployment, monitoring, and retirement of machine learning models.
Learn more about AI model governance
: A comprehensive catalog of all AI and machine learning models in use across the organization, with metadata about data dependencies, ownership, and risk classification.
: Continuous tracking of deployed model performance, accuracy, drift, fairness, and data quality to ensure ongoing reliability and regulatory compliance.
: The gradual decline in a model’s accuracy or quality metrics over time due to data drift, requiring investigation through lineage and quality monitoring.
: A centralized system for versioning, cataloging, storing, and governing machine learning models and their associated data lineage throughout the ML lifecycle.
: Governance rules that define when, how, and under what data quality conditions machine learning models must be updated with new training data.
: The controlled process of decommissioning outdated or risky models from production with proper impact analysis, communication, and governance documentation.
: A categorization system for models based on potential business impact, regulatory exposure, data sensitivity, and ethical implications requiring different governance levels.
: The practice of tracking and managing different versions of machine learning models, including their metadata, training data, performance metrics, and relationships to data assets.
: Structured testing of AI systems by adversarial teams to identify vulnerabilities, hallucinations, bias, security weaknesses or unsafe behavior before production deployment.
: Potential harm arising from AI model behavior, data quality issues, bias, misuse or regulatory non-compliance that could impact individuals, organizations or society.
: The use of AI and machine learning to automate IT operations, including data pipeline anomaly detection, capacity planning, and root cause analysis for data systems.
: The obligation to justify automated decisions made by AI systems and mitigate potential harm, ensuring transparency and responsibility throughout the AI lifecycle.
: A mechanism that enables applications or services to exchange data and functionality through standardized requests and responses, essential for data product consumption.
: The process of identifying patterns in data that are outside the normal ranges.
: A chronological record of data access, modifications, policy changes and governance activities that provides evidence for compliance verification and security investigations.

B

: Bias in AI is the presence of discriminatory outcomes/outputs. Bias can arise from various sources including incomplete data, improper use-case, or flaws in the AI algorithm.
: Set of principles from the Basel Committee on Banking Supervision, primarily for globally systemically important banks (G-SIBs), aiming to enhance their capabilities to identify, aggregate, and report risk data efficiently, especially during stress, to bolster risk management and decision-making.
Learn more about BCBS 239 (Basel Committee on Banking Supervision - Principle 239)
: A modification to a data schema, API, quality specification or service level that violates existing data consumer expectations and breaks dependent systems.
: A complete set of data-related terms explained in simple language to ensure everyone in an organization can understand and communicate effectively.
Learn more about Business glossary
: A simplified, high-level view of data flow, connecting data assets to the business terms and processes they support, designed for non-technical users.
: Context-rich information about data’s business meaning, ownership, business rules, usage and relationships that makes technical data understandable to business users.

C

: A program designed to simulate human conversation through text or voice interactions, often in use cases involving customer service.
: A field of AI where AI models utilize supervised learning to bucket data, images, or other data inputs into a predefined number of classes/categories.
: A field of AI focused on enabling computers to interpret visual data from images or videos.
: A vast dataset utilized by AI models to discern patterns, forecast outcomes, or generate specific results. It may comprise some combination of structured and unstructured data, addressing singular or diverse subjects.
: A gradual rollout strategy for AI models or data products that releases changes to a small subset of users before full-scale deployment to minimize risk.
: California state law granting consumers rights over their personal data, including the right to know what data is collected, delete it, and opt out of its sale.
Learn more about CCPA (California Consumer Privacy Act)
: An A/B testing methodology where new AI models (challengers) are evaluated against existing production models (champions) to determine performance improvements before deployment.
: A formal process that provides an official "stamp of approval" on a data asset, indicating it is trusted, accurate and ready for use.
: Technology that identifies and captures changes made to data in real-time, enabling efficient data synchronization, lineage tracking, and data product freshness.
: An executive decision-maker responsible for enterprise-wide data governance and how information is used as an asset.
: Business users with analytical skills who create models and perform data analysis without formal data science training, enabled by governed self-service access to trusted data.
: Granular tracking of data flow at the individual column or field level through transformations, providing detailed visibility into data dependencies and impact analysis.
: Changes in the statistical relationship between input features and target outputs over time, causing AI model performance to degrade even without underlying data distribution changes.
: Systems and processes for tracking, managing, and honoring user consent preferences for data collection and usage to ensure privacy compliance.
: Processes and technologies for identifying and filtering unsafe, biased, harmful or policy-violating AI-generated content before it reaches end users.
: A standardized set of terms and definitions used consistently across an organization to ensure clarity, prevent miscommunication, and support metadata management.
: A governing body of stakeholders across the organization that provides a strategy for data governance programs, raises awareness of its importance, approves enterprise data policies and standards, prioritizes related projects and enables ongoing support.
: The active management of data assets, including defining, enriching with metadata and certifying them to improve their value and trustworthiness.

D

: Data quality determines if data is fit for use to deliver high quality AI model outputs. Data is considered high quality based on consistency, uniqueness, completeness and validity. High quality data must have standardized data entries, data sets that do not contain repeated or irrelevant entries, be representative of real-world conditions, and conform to the formatting required by the business.
: A not for profit member organization that brings together leading businesses and institutions across multiple industries to learn, develop, and adopt responsible data and AI practices.
Learn more about Data & Trust Alliance
: The practice of managing, organizing, and manipulating data to extract meaningful insights and support decision-making. Within AI, DataOps is used to manage and support the full lifecycle of AI development and deployment.
: Synthetic media created using AI, manipulating videos or images to replace one person's likeness with another, often raising concerns about misinformation and deception.
: The set of technical mechanisms and policies used to regulate who can view, use or modify specific data within a database, system or data product.
: The organizational strategy and processes for managing and controlling access to data. Its goal is to protect sensitive data assets while ensuring that authorized data consumers have timely and appropriate access to the data they need.
: The degree to which data is factually correct and precise.
: The process of removing or altering identifying information from datasets to protect individual privacy while maintaining data utility for analytics and AI.
: An unexpected deviation from normal data patterns that requires investigation, potentially indicating data quality issues, pipeline failures, or security incidents.
: The process of moving inactive or historical data to long-term storage systems while maintaining accessibility, lineage, and compliance with retention policies.
: A cloud-based delivery model where data products are provided on-demand to consumers via APIs or data marketplace interfaces without requiring local storage.
: A comprehensive catalog of all data holdings within an organization, including metadata about location, ownership, sensitivity classification, and usage for governance.
: Methods and frameworks for quantifying the business value or economic worth of data assets based on quality, usage, uniqueness, and contribution to business outcomes.
: An unauthorized access to, disclosure of, or acquisition of sensitive data that compromises its confidentiality, integrity, or availability, requiring incident response.
: A centralized inventory of data assets that helps organizations find and leverage data for efforts including analytics and AI initiatives. Modern data catalogs automatically scan data and metadata to help users efficiently discover valuable information at scale.
Learn more about Data catalog
: Involves organizing data into relevant groups (classes). This process makes data – especially sensitive data – easier to locate and retrieve and is crucial for risk management, compliance and data protection.
: Whether all data fields required are available
: The degree to which data does not conflict across sources
: An individual, team or system that uses data to gain insights, make decisions, or power applications, relying on governed and trusted data.
: A formal machine-readable, enforceable agreement that defines expectations, policies and SLAs for data products, creating a shared understanding between data producers and data consumers.
Learn more about Data contract
: Automated testing that verifies data continuously conforms to published data contract specifications for schema, quality thresholds, and service levels, and other agreed-upon requirements.
: A user who collects and holds information on behalf of a data provider or requester. Custodians are responsible for managing the use, disclosure and protection of data.
: Processes for permanently and verifiably removing personal data from systems in compliance with privacy regulations like GDPR and CCPA.
: Making data accessible to all employees and stakeholders, and educating them on how to work with data, regardless of their technical background.
: Centralized repository of information about data, serving as a comprehensive collection of metadata. It defines and describes the characteristics of every data element within a database, system, or organization, acting as a crucial reference for both technical and business users.
: Data discovery involves locating data assets, while understanding their content, structure and relationships. This is important to ensure data assets throughout an organization are properly managed and governed.
: A high-level category of data based on a business subject area, such as “Customer”, “Product” or “Finance."
: Changes in the statistical properties or distributions of data over time that can negatively impact analytics accuracy, AI model performance, and data quality metrics.
: A framework of principles guiding responsible data collection, processing, storage, and usage aligned with ethical norms, social values, fairness, and organizational values.
: An architectural approach that provides integrated, automated data management across distributed environments using active metadata, AI-powered recommendations, and knowledge graphs.
: Whether the data is up-to-date and reflects the current state of a airs.
: Involves defining policies, processes and roles for data handling, enhancing decision making and ensuring regulatory compliance by managing data assets and enforcing data access.
Learn more about Data governance
: A cross-functional group of leaders responsible for setting data strategy, prioritizing initiatives, and resolving high-level data issues.
: Establishes a basis for collaboration by providing a shared language among stakeholders.
: A framework for assessing an organization’s current data governance capabilities and defining a roadmap for advancing through progressive maturity stages.
: The organizational structure, roles, decision rights, and workflows that define how data governance is executed, monitored, and improved across the enterprise.
: A mechanism for raising, managing and resolving data issues to the right stakeholders to improve trust in data quality.
: A centralized integration point that connects multiple data sources and consumers, facilitating governed data exchange, transformation, cataloging, and distribution.
: A classification system (e.g., critical, high, medium, low) for prioritizing data quality issues based on business impact, compliance risk, and urgency.
: A centralized repository that stores structured and unstructured data at any scale in its raw, native format for later processing, analytics, and AI model training.
: A hybrid data architecture combining the flexibility and scale of data lakes with the governance, quality management, and query performance of data warehouses.
: Comprehensive policies and processes governing how data is created, cataloged, governed, used, archived and deleted throughout its existence.
: The data's lifecycle path. It shows where data originates (source), how it moves, and what transformations it undergoes.
Learn more about Data lineage
: The ability to understand, interpret and communicate data effectively, turning it into meaningful information and insights.
: A decentralized data architecture paradigm that treats data as a product, with domain-oriented ownership, federated governance, and self-serve data infrastructure platform.
Learn more about Data mesh
: The privacy principle and practice of collecting only the data strictly necessary for a specific, defined purpose, reducing compliance risk and storage costs.
: An online platform that connects data providers and data consumers enabling users to find, assess and access high-quality, trustworthy data assets. Data marketplaces can be limited to internal use within an organization, and/or they can facilitate the external exchange of data.
Learn more about Data marketplace
: The practice of generating measurable economic value from data assets, either through internal optimization of operations or external data product commercialization.
: A proactive approach to monitoring and understanding the health of data systems, enabling the automated detection and resolution of data issues
Learn more about Data observability
: A user responsible for an information asset’s accuracy, integrity and timeliness. Owners also establish the controls for its generation, collection, processing, access, dissemination and disposal.
: The sequence of steps that data moves through, from its source to a final destination (e.g., from an app to a data warehouse to a BI dashboard). Observability monitors the health of these pipelines.
: Refers to an individual’s control over the collection, usage and sharing of sensitive and personal information, such as name, location, and behavior, with others. This control is an important aspect of compliance with external regulations such as the General Data Protection Regulation (GDPR).
Learn more about Data privacy
: The domain team or individual responsible for creating, maintaining and ensuring the quality of a data product.
: Usage analytics that track who accesses data products, how frequently, for what purposes and the value generated to inform product improvement.
: A controlled, communicated process for retiring outdated or low-value data products while minimizing disruption to consumers and maintaining governance records.
: Search capabilities, recommendation systems and data marketplace features that enable users to find relevant, trusted data products that meet their specific needs.
: Mechanisms including ratings, reviews, feature requests and issue reporting that allow consumers to collaboratively improve data products iteratively.
: The end-to-end stages of a data product from ideation and creation through deployment, adoption, monitoring, maintenance, evolution and eventual retirement.
: The accountable business or domain leader responsible for defining the vision, roadmap, quality standards, adoption and ongoing success of a specific data product.
: Formal commitments regarding a data product’s availability, freshness, latency, quality thresholds, support responsiveness and other agreed-upon requirements defined in data contracts.
: The systematic management of changes, updates and releases to data products over time while maintaining schema compatibility and consumer trust.
: A required role under GDPR responsible for monitoring compliance with privacy regulations, advising on data protection assessments, and serving as regulatory point of contact.
: The comprehensive documentation of data origins, ownership, movements, and transformations throughout its lifecycle, providing verifiable lineage and audit history.
: A visual interface within data governance platforms that displays data quality metrics, trends, issue counts, and alerts to provide visibility into data health.
: A quantitative measurement aggregating multiple data quality dimensions into an overall assessment of dataset fitness, reliability, and trustworthiness.
: The process of copying data across multiple systems or locations to improve availability, performance, disaster recovery, or geographic distribution while maintaining lineage.
: Legal or policy requirements specifying where data must physically be stored, such as within specific countries, regions, or cloud availability zones for compliance.
: Rules and policies defining how long different types of data must be stored before archival or deletion for compliance, legal, operational, and governance purposes.
: The measurable business value and financial return generated from data governance, quality, and analytics initiatives relative to investment in people and technology.
: Providing internal or external parties with access to trusted data to enhance collaboration, improve data literacy and foster a strong data-driven culture.
: A formal arrangement between data owners and consumers documenting the details of the responsibilities of each party and how data should be accessed and used. This helps ensure that data is shared responsibility in compliance with relevant policies and regulations.
: Those who own the data – and ensure it is accurate, trustworthy and accessible.Data stewardship: Involves collaboration on data assets across departments with role-based dashboards and interactive views.
: The technical code used to measure and validate data against metrics and thresholds.
Learn more about Data quality rule
: Reusable data assets (such as a table or business analytics dashboard) that bundle data together with everything needed to make it independently usable by authorized data consumers.
: The process of analyzing data to gather information about its structure, attributes and quality.
: The process of validating source and target data match after data movement.
: The principle that data is subject to the laws, regulations, and governance structures of the nation or jurisdiction where it is collected, stored, or processed.
: A comprehensive long-term plan that aligns data governance capabilities, quality initiatives, and analytics investments with overall business objectives and competitive strategy.
: The rights granted to individuals under privacy regulations including access, rectification, deletion, portability, and restriction of processing of their personal data.
: A consumption model where users subscribe to ongoing, scheduled delivery of data products, datasets, or quality reports rather than one-time access requests.
: A third-party entity or internal role that manages, protects, and governs data on behalf of individuals or business units with fiduciary responsibility and accountability.
: Technology that provides a unified view and access to data across disparate sources without requiring physical movement, copying, or replication while maintaining governance.
: A centralized repository optimized for analytics and reporting, where data from multiple sources is integrated, cleaned, structured, cataloged, and historicized.
: An organizational mindset and set of practices that consistently value and utilize trusted data insights in decision-making at all levels.
: An organizational mindset and set of practices that consistently value and utilize trusted, governed data insights in decision-making at all levels.
: Standardized documentation describing dataset composition, collection methodology, intended use, limitations, quality metrics, and ethical considerations for AI training data.
: Number of distinct values and duplicate data records.
: Anyone who is an end-user of the data within the organization.
: The degree to which data conforms to the expected format, type and range.
: A mathematical technique for sharing aggregate insights about datasets while providing provable guarantees that individual records cannot be identified or re-identified.
: A situation where an AI system disproportionately affects protected demographic groups due to biased training data or algorithmic design, requiring fairness monitoring.
: A software and organizational framework for structuring data ownership and systems around business domains rather than technical components, foundational to data mesh.
: The process of identifying entities in multiple records and linking them together.

E

: The guideline and principles that govern the ethical and moral implications of AI development and deployment. Ethical AI ensures AI benefits while minimizing potential risks and ethical dilemmas.
: Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence: An Executive order that aims to strengthen AI practices and mitigate risks across the US federal government and private industry, establish standard practices, and enhance information sharing.
Learn more about Executive Order 14110 (US White House)
: The ability to detail outputs of AI Models, ensuring they are understood, to the best of their ability, by technical and non-technical audiences alike.
: European Union law that provides AI developers with clear requirements and obligations regarding specific uses of AI, helping them build trustworthy AI use cases while mitigating risks to EU citizens and organizations. The Act was passed on March 13, 2024.
Learn more about EU AI Act
: A modern data integration pattern where raw data is first loaded into a target system before transformation, with lineage tracked through the catalog.
: A classic data integration pattern where data is extracted from sources, transformed to meet requirements, then loaded into target systems with documented lineage.
: A system design pattern where components respond to and process real-time data events or changes, enabling streaming data governance and observability.

F

: Ensuring unbiased, equitable, and accurate outcomes across diverse groups and individuals, mitigating bias and discrimination.
: Process for determining the most informative attributes of a dataset for use in a machine learning model.
: Fine-tuning is the process of adjusting and optimizing a pre-trained large language model on a specific dataset, set of questions & answers, or tasking to improve its performance. It involves tuning the model outputs with the new data, allowing it to adapt and specialize for the desired task or domain.
: A core large language model (LLM) framework, trained on large and diverse data sets, used as the basis for developing more complex or specialized models in machine learning (ML) and artificial intelligence (AI).
: Quantitative measurements assessing whether AI models treat different demographic groups equitably, monitored through model governance and observability frameworks.
: A centralized repository for storing, managing, versioning, and cataloging engineered machine learning features with lineage to source data and consuming models.
: A governance model balancing centralized data policies and standards with domain autonomy and accountability, essential for implementing data mesh architectures.
: Tracking data flow and transformations at the individual field or attribute level, providing the most granular view of data dependencies for impact analysis.
: Data that meets the specific quality, freshness, completeness, accuracy, and governance requirements necessary for its intended business or analytical use case.

G

: Artificial intelligence capable of producing previously unseen text, images, or other media in response to user prompts.
: A data governance role responsible for maintaining business glossary terms, ensuring consistent definitions, resolving semantic conflicts, and promoting glossary adoption.
: A comprehensive data protection and privacy law enacted by the European Union. It mandates how personal data of EU citizens must be collected, processed, and stored, granting individuals significant rights over their data. It applies to any organization, regardless of location, that handles the personal data of EU residents.
: The single, most accurate, complete, and authoritative version of a master data entity across all systems, managed through master data management and governance.
: Automated constraints, validation rules, and policy enforcement mechanisms that prevent AI systems from accessing ungoverned data or producing policy-violating outputs.

H

: Misleading content created by generative AI that conflicts with source material or generates factually inaccurate output while maintaining the appearance of accuracy.
: A U.S. federal law primarily designed to protect sensitive patient health information (PHI) from being disclosed without the patient's consent or knowledge.

Learn more about HIPAA (Health Insurance Portability and Accountability Act)
: A system design where human judgment is explicitly integrated into an automated AI process to resolve ambiguity, ensure accuracy, or provide oversight.

I

: The process of developing insights or conclusions from available information by analyzing patterns and relationships within data.
: The ability to look "downstream" in a lineage diagram to see which reports, dashboards, and assets will be affected if a change is made to a data source.
: Documentation detailing the downstream effects, risks, affected assets, and stakeholder impacts from proposed changes to data sources, schemas, or business terms.
: Structured processes for detecting, triaging, assigning, tracking, resolving, and communicating data quality issues or governance policy violations.
: A deployed API or service that exposes a machine learning model to receive input data and return predictions, with access governed through data governance platforms.
: An international standard for establishing, implementing, and maintaining an information security management system applicable to data governance and protection practices.

J

: A prompt intentionally designed to avoid ethical guidelines or restrictions within a large language model (LLM).

K

: A stream-processing-only data architecture that handles all data as streams, simplifying governance by eliminating separate batch processing layers.
: A sophisticated type of database that is used to store and manage the semantic layer. It represents data as a network of entities and relationships, which is ideal for mapping complex business context.
Learn more about Knowledge graph

L

: A specific type of artificial AI model that has been trained on large amounts of text to interpret, understand, and generate human-like language outputs. Examples include OpenAI ChatGPT and Google Gemini.
: A data processing architecture combining batch and real-time stream processing layers to provide comprehensive analytics with unified governance and lineage.
: Graphical representations and interactive diagrams that display data flows, transformations, business term relationships, and dependencies across systems.

M

: A subset of AI where algorithms learn patterns and make predictions from data without explicit programming, improving performance over time.
: A collection of tools, technologies, and best practices to build, deploy, monitor and manage machine learning and AI models. It is the key capability for scaling and governing AI at the enterprise level.
: MLOps stands for Machine Learning Operations. MLOps is a core function of Machine Learning Engineering, focused on streamlining the process of taking machine learning and AI models from ideation to production, and then maintaining and monitoring them. End-to-end MLOps systems also typically incorporate data layer connectivity and feature curation. MLOps is a collaborative function, often comprising data scientists, devops engineers, and IT.
: The governance processes that guide which data products are approved, quality-certified, published, maintained and deprecated in a data marketplace.
: The discipline and technologies for creating, maintaining, and governing a single, authoritative source of truth for critical business entities like customers and products.
: Data about your data. The semantic layer is a form of "business metadata" that is mapped to the "technical metadata" of your physical data sources.
: Policies, processes, workflows, and controls for managing the quality, consistency, completeness, and lifecycle of technical, business, and operational metadata.
: The tracking of how metadata itself – including business terms, definitions, and data classifications – changes, evolves, and flows through governance workflows over time.
: A system design approach where active metadata controls data discovery, quality rules, access policies, and lineage rather than hard-coded system logic.
: Processes and technologies for managing data information about your data (metadata). Metadata management provides insights for effective data management and helps individuals users search for, understand and access the data they need.
: A hybrid data processing approach that processes small batches of data at frequent intervals, balancing near-real-time performance with governance and quality validation.
: A decentralized software architecture where applications are composed of small, independent, API-driven services that exchange governed data through documented interfaces.
: A standardized documentation artifact detailing an AI model’s purpose, training data lineage, performance metrics, limitations, intended use, and ethical considerations.

N

: A field of AI where models understand, interpret, and generate human language.
: A model inspired by the human brain, comprising interconnected nodes (neurons) that process information to perform tasks like pattern recognition, text generation, or prediction. Not included in this glossary are individual components of a neural network, such as activation functions, layer types, and cost functions.
: US Government agency dedicated to promoting U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life. NIST has developed an AI Risk Framework to help organizations better manage AI risks.
Learn more about National Institute of Standards and Technology (NIST)

O

: A formal, structured way of representing knowledge and the relationships between concepts. It is often used to define the rules and vocabulary for a semantic layer.
: Outlines how an organization defines roles, responsibilities, business terms, asset types, relations, domain types and more.
: Runtime information about data access patterns, transformation execution, pipeline performance, quality check results, and user activity for observability.

P

: Ensuring data used in AI is handled in accordance with all applicable data privacy laws and regulations to prevent unauthorized access, use, or disclosure of sensitive information.
: The input provided to a model or system to initiate a specific task or query. It can take various forms, such as a question, a statement, or a command, and is designed to elicit a desired response from the AI system.
: A technique used in artificial intelligence, especially in natural language processing, where the user optimizes a prompt to a Generative AI model via commands, examples, or other techniques, to elicit a desired response
: Health-related data about identifiable individuals protected under HIPAA regulations, requiring special classification, access controls, and governance in catalogs.
: Any information that can uniquely identify an individual, requiring classification, protection, and governance according to privacy policies.
: A set of high-level rules and principles that define how data should be managed, accessed, and used to ensure compliance and security.
: A systematic method of creating, reviewing and updating data policies so that they are adopted and applied throughout your organization.
: The principle of embedding privacy protections, data minimization, and consent management into data systems and governance processes from the earliest design stages.
: A systematic evaluation of privacy risks associated with data processing activities, documented through governance workflows and impact analysis.
: The process of replacing identifying data fields with artificial identifiers while maintaining lineage and the ability to re-identify when authorized for legitimate purposes.
: The input provided to a model or system to initiate a specific task or query. It can take various forms, such as a question, a statement, or a command, and is designed to elicit a desired response from the AI system.
: A technique used in artificial intelligence, especially in natural language processing, where the user optimizes a prompt to a Generative AI model via commands, examples, or other techniques, to elicit a desired response.
: A privacy principle requiring data is collected and used only for specific, documented purposes aligned with business glossary definitions and usage policies.

R

: A field of AI where models utilize supervised learning to predict a continuous outcome from data, images, or other data inputs.
: Ensuring that AI models adhere to relevant laws and regulations including data protection laws, anti-discrimination laws, and industry-specific regulations like HIPAA.
: A global and member-driven non-profit dedicated to enabling successful responsible AI efforts in organizations.
Learn more about Responsible AI Institute
: The combination of a retrieval based method with an LLM, where supporting documents are embedded into a vector store, retrieved when a user queries the model, and included as additional context during prompting to an LLM
: The ability of AI models to consistently produce accurate and trustworthy results, without unexpected errors or biases, across various tasks and conditions.
: Ability to monitor data health and pipeline performance in real-time, ensuring immediate detection and notification of issues.
: A stable, controlled set of values used to categorize other data, ensuring consistency across the organization.
Learn more about Reference data
: Data used to define and classify other data and reconcile data across operational systems for accurate reporting and analyses.
: Ensures adherence to relevant laws and regulations including data protection laws, external mandates and industry-specific regulations like HIPAA, BCBS 239 and GDPR.
: Formal submissions and documentation provided to regulatory authorities demonstrating data governance compliance with applicable laws and industry standards.
: A performance metric measuring elapsed time from when a data quality issue or policy violation is detected until it is fully resolved and verified.
: The process of moving transformed data from data warehouses back into operational systems, with lineage tracking to maintain governance and understanding of data flows.
: An individual’s right under privacy regulations to obtain confirmation of whether their personal data is being processed and receive copies, managed through governance workflows.
: An individual’s right to receive meaningful information about the logic and consequences of automated AI decision-making, supported by model and data lineage.
: An individual’s right under privacy regulations to request correction of inaccurate personal data, tracked through data quality and stewardship workflows.
: Limits data access based on an individual’s role within an organization, ensuring that personnel only access information required for their job duties. RBAC is critical for protecting sensitive data when managing information access for large numbers of employees, third parties and contractors.
: The ability to look "upstream" in a lineage diagram to find the original source of an error or bad data seen in a report.

S

: Implementing measures to protect AI models from internal and external threats as well as ensuring the confidentiality of the data used by AI models.
: A Canadian entity focused on the standardization of guidelines, processes, and characteristics in a wide range of fields, including AI. In 2023, the SCC developed the AI and Data Governance Standardization Collaborative (AIDG) to expand on data governance standardization and include new AI issues.
Learn more about Standards Council of Canada (SCC)
: Artificial data that mimics real data which is often used for training machine learning (ML) models when there isn’t sufficient real data or it is highly sensitive.
: A data source that serves as a container to organize tables and columns in a relational database and fields in a file.
: Columns and data types that have been added, removed or changed in data sources.
: The managed, intentional process of modifying data structures over time while maintaining data contract compatibility and minimizing disruption to data product consumers.
: A centralized service for managing, versioning, and validating data schemas to ensure consistency and compatibility across data contracts, products, and consuming systems.
: The practice of enabling business users to independently discover, understand, and analyze trusted, governed data without requiring technical expertise or IT intervention.
: An abstraction layer that translates technical data into business language.
: Search technology powered by knowledge graphs and natural language processing that understands business meaning and context rather than relying solely on keyword matching.
: Quantitative measurements such as data freshness lag, data quality score, or availability percentage used to assess data product performance against data contract commitments.
: Target values for service level indicators that define acceptable performance thresholds for data products, such as “freshness < 1 hour” or “data quality score > 95%”.
: Running new AI models alongside production models without affecting outputs, allowing validation against trusted data and monitoring for drift before full deployment.
: Data continuously generated and processed in real-time via technologies like Kafka, requiring real-time governance, quality validation, and lineage tracking.
: Highly organized data with a predefined model, typically fitting neatly into tables with rows and columns (e.g., a SQL database, an Excel file).

T

: A portion of a dataset that is used to train an AI model until it can proficiently predict results or uncover patterns.
: A portion of a dataset that is not used to train an AI model but rather measure performance of an AI model after training is completed.
: Ensuring that AI models, including algorithms and decision-making processes, are open and understandable to relevant stakeholders, such as developers, data scientists, business users, legal and privacy professionals.
: Data that is high-quality, secure, understood, accurate, and easily accessible by the proper individuals to confidently create and deploy valuable, ethical, responsible, and compliant AI models.
: Often used interchangeably with Ethical AI, Trustworthy AI is the development of AI models that are reliable, ethical, and can be trusted by users.
: A detailed, granular view of data flow at the system, table, and column level, often automatically harvested from source system metadata.
: Replacing sensitive data values with non-sensitive surrogate tokens while maintaining format and data relationships, tracked through lineage for compliance and security.
: AI evaluation of harmful, abusive or discriminatory language in training data or model outputs, managed through content governance and quality monitoring.
: A process or set of logic that changes data from one format or structure to another (e.g., an ETL job, a SQL query).

U

: Data that lacks a predefined data model or organizational structure, such as text documents, emails, PDFs, images and audio files.

V

: A portion of a dataset that is not used to train an AI model but rather measure performance of an AI model during training.
: A vector database is a database designed to store and organize unstructured data, like text, images, or audio, using vector embeddings (high-dimensional vectors). This structure enables rapid retrieval of similar items, streamlining search and retrieval processes.
: A Canadian government backed AI Institute that empowers researchers, businesses and governments, to develop and adopt AI responsibly.
Learn more about Vector Institute

W

: A way to automate and streamline processes and ensure collaboration among key stakeholders in your organization.

Z

: A security model assuming no implicit trust and requiring verification for every data access request, integrated with role-based access controls and policy enforcement.