I. Executive summary
I.A. Purpose of the Pilot
The Climate and Disaster Resilience Pilot (CDRP) Phase 2 aimed to explore the integration of geospatial data and generative AI for enhancing climate resilience and disaster management. The pilot demonstrated various AI-enabled tools, engineering reports, and sectoral applications designed to improve data analysis, communication, and decision-making.
I.B. Key Objectives and Scope
The pilot undertook a comprehensive evaluation of the readiness of generative AI (GenAI)) technologies to address pressing climate-related challenges, including floods and wildfires. It also emphasized the need to enhance data standards and compliance, foster collaboration among diverse stakeholders, and develop actionable tools and workflows. These efforts are aligned with FAIR principles (Findable, Accessible, Interoperable, Reusable) to advance global resilience strategies and support informed decision-making in climate resilience and disaster management.
I.B.1. Objective 1: Integration of Generative AI Virtual Assistants for Climate Resilience
The first objective focused on integrating generative AI virtual assistants into existing geospatial data frameworks to enhance climate resilience. By leveraging AI, the goal was to improve data accessibility, usability, and decision-making by bridging the gap between complex geospatial datasets and actionable insights. The pilot assessed platforms such as Copernicus Climate Change Service (C3S) and WEkEO to determine their interoperability with GenAI tools and ensure alignment with FAIR principles. Key outcomes included the development of prototype virtual assistants capable of:
Improving Data Discoverability – Enabling users to efficiently find relevant datasets and services.
Providing Actionable Insights – Transforming raw geospatial data into comprehensible, decision-ready information.
Enhancing User Engagement – Offering plain-language responses and contextual guidance tailored to various stakeholder needs.
I.B.2. Objective 2: Development of GenAI Prototypes for Data and Service Environments
The second objective focused on developing GenAI prototypes tailored for diverse data and service environments. These prototypes demonstrated the practical capabilities of GenAI tools in enhancing data usability, accessibility, and stakeholder engagement across multiple domains, including climate resilience, health, energy, and insurance. The prototypes were designed to:
Improve Findability – Enhance the discoverability of relevant data and services, particularly from key platforms like Copernicus Climate Change Service (C3S) and WEkEO.
Facilitate Informed Decision-Making – Offer stakeholders actionable insights derived from structured geospatial data, contextual knowledge, and domain-specific expertise.
Deliver Plain-Language Responses – Provide users with clear and comprehensible answers to complex queries, including references to trusted sources, visualizations, and associated links.
I.B.3. Objective 3: Assessment of Data Maturity and Interoperability for GenAI Integration
The third objective assessed the maturity and interoperability of existing data and service platforms to support GenAI integration. This evaluation pinpointed that data ecosystems are robust, accessible, and capable of supporting the GenAI’s advanced capabilities in climate resilience and disaster management workflows. This objective involved:
Evaluating Data Maturity – Assessing the readiness of platforms such as Copernicus Climate Change Service (C3S), WEkEO, and NOAA datasets based on criteria like FAIR principles, AI-readiness, and cloud optimization.
Enhancing Interoperability – Identifying gaps and barriers in dataset interoperability, ontologies, and APIs across different platforms, including OGC-compliant services.
Addressing Challenges – Overcoming issues related to inconsistent metadata standards, varying data formats, and limited cross-platform compatibility to facilitate seamless GenAI integration.
Crosswalks for Ontology Alignment – Developing mappings between geospatial ontologies and data models to ensure consistent and interoperable data usage.
I.C. Summary of Findings and Recommendations
I.C.1. Key Findings
One of the key findings was the effectiveness of generative AI in climate data analysis. AI-powered virtual assistants were developed to help users explore data requirements, analyze climate impacts, and support decision-making. These assistants leveraged large language models (LLMs), such as Llama 3, to efficiently process structured and unstructured climate data.
A major focus was on data integration and interoperability. Participants assessed Copernicus Climate Services, advanced analysis-ready data (ARD), and integrated OGC geospatial standards. Notably, GeoLabs, IIT Bombay, and the University of Alabama collaborated to develop a shared ontology that enhances semantic interoperability between OGC standards and environmental datasets.
Several AI demonstrators were developed for specific sectoral applications:
Coastal Resilience – Hartis built an AI-powered demonstrator to assess the Coastal Vulnerability Index (CVI) using geospatial and environmental datasets.
Drought and Heat-Related Health Risks – Pixalytics integrated Copernicus API data to calculate drought and heat indices.
Flood Hazards and Mitigation – GIS FCU developed a virtual assistant providing insights on flood risks, mitigation strategies, and response plans for Canada.
The pilot also explored AI applications in wildfire risk assessment and insurance. Xentity and NRCan conducted a state-of-the-art review on AI in wildfire risk modeling, identifying tools to assist insurers, policymakers, and affected communities in risk assessment.
Beyond data analysis, AI-powered knowledge discovery and decision support systems were also demonstrated. Dante’s Knowledge Engine showcased multimodal AI search, integrating government, commercial, and social media data. TerraFrame introduced a graph-based AI model to analyze cumulative climate disaster impacts, offering insights into the indirect effects of climate hazards on infrastructure, education, and healthcare.
Several challenges were identified, including:
Lack of geospatial awareness in AI models, requiring additional geocoding and ontology mapping.
AI hallucinations, where generative models produced incorrect or misleading responses, especially in disaster scenarios.
Data interoperability issues, requiring improvements in metadata standards and cross-platform compatibility.
To address these challenges, participants recommended utilizing Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) reasoning — two complementary techniques designed to enhance the quality and reliability of generated responses.
RAG models improve language models by retrieving relevant information from external sources before generating a response. This retrieval process allows the model to augment its internal training data with verified, up-to-date knowledge, resulting in responses that are more accurate, contextually appropriate, and factually reliable. Rather than relying exclusively on internalized knowledge, RAG actively integrates real-world information throughout the generation process.
In contrast, CoT reasoning is a method where the model is encouraged to break down complex problems into a sequence of coherent, logical steps, enabling more structured and interpretable reasoning.
I.C.2. Recommendations
To enhance AI applications in climate resilience and disaster management, participants proposed several key strategies aimed at improving geospatial awareness, aligning AI systems with industry standards, integrating AI into early warning frameworks, fostering collaboration, and enhancing financial risk modeling.
A critical recommendation was to improve AI’s geospatial awareness by developing domain-specific AI training and employing graph-based knowledge representations. This approach would strengthen spatial reasoning capabilities in AI models, allowing them to better understand and process geospatial data.
Another essential aspect involved aligning AI systems with geospatial standards. Implementing OGC-compliant APIs and expanding semantic ontologies would enhance AI-driven geospatial analysis, ensuring consistency, interoperability, and accuracy when dealing with climate-related data.
The integration of AI into early warning systems was also emphasized. By operationalizing AI-driven climate risk assessments within emergency response frameworks, AI could significantly enhance preparedness and response to events such as wildfires, flooding, and droughts.
Participants further highlighted the importance of fostering collaboration between AI developers, climate scientists, and insurers. Strengthening these partnerships could improve predictive modeling for various climate risks and support the development of more accurate and reliable forecasting tools.
Additionally, recommendations focused on enhancing financial risk modeling through AI, providing valuable tools for climate-related financial risk assessment. This would be particularly beneficial for insurance companies and government agencies seeking to quantify and mitigate potential economic impacts.
The findings from this pilot are publicly accessible through the CDRP Phase 2 engineering reports and project website, along with demonstrators illustrating AI’s potential in climate resilience. Future initiatives will concentrate on enhancing AI demonstrators, refining user interfaces, improving predictive modeling, and conducting thorough uncertainty analysis. Moreover, ongoing engagement with policymakers, industry leaders, and community organizations will be essential to facilitate real-world deployment of AI solutions in climate resilience.
II. Keywords
The following are keywords to be used by search engines and document catalogues.
Generative AI, Virtual Assistant, Geospatial Data, Climate Data, Retrieval-Augmented Generation (RAG), Natural Language Processing (NLP), FAIR Principles (Findable, Accessible, Interoperable, Reusable), OGC Standards, OGC API — Processes, Machine Learning (ML), Climate Trend Analysis, Copernicus Data Sources, ECMWF Climate Data Store (CDS), Green Deal Data Space, WEkEO Environmental Data Hub, GIS Tools, Data Search and Discovery, Multi-Modal AI, Satellite Data Processing, Prompt Engineering, Geospatial Intelligence, GPU-Powered AI Inferencing, Data Cleaning and Indexing, Training Data Markup Language (TDML), Machine Learning as a Service (MLaaS), Code Generation for Climate Data Analysis, Chain of Thought (CoT) Reasoning, AI Hallucination Mitigation, Data Visualization, Multi-LLM Model Selection, Spatial Knowledge Graphs, Environmental Monitoring
III. Contributors
All questions regarding this document should be directed either to the editor or to the contributors.
Name | Organization | Role | ORCid |
---|---|---|---|
Stelios Contarinis | HARTIS | Editor | https://orcid.org/0000-0002-5789-4098 |
Vyron Antoniou | HARTIS | Editor | https://orcid.org/0000-0002-7365-9995 |
Loukas Katikas | HARTIS | Contributor | https://orcid.org/0000-0003-1886-4125 |
Samantha Lavender | Pixalytics | Contributor | http://orcid.org/0000-0002-5181-9425 |
Gérald Fenoy | GeoLabs | Contributor | https://orcid.org/0000-0002-9617-8641 |
Chetan Mahajan | Indian Institute of Technology Bombay | Contributor | https://orcid.org/0009-0001-9632-184X |
Surya Durbha | Indian Institute of Technology Bombay | Contributor | https://orcid.org/0000-0003-1022-8378 |
Rajat Shinde | University of Alabama in Huntsville | Contributor | https://orcid.org/0000-0002-9505-6204 |
Nathan McEachen | TerraFrame | Contributor | https://orcid.org/0009-0009-7419-4905 |
Matt Tricomi | Xentity | Contributor | |
Joost van Ulden | Natural Resources Canada, Canada Centre for Mapping and Earth Observation | Contributor | |
Micah Brachman | OGC | Editor | https://orcid.org/0009-0008-6198-5145 |
Ingo Simonis | OGC | Editor | https://orcid.org/0000-0001-5304-5868 |
1. Introduction
1.1. Pilot’s Background and Motivation
The Climate and Disaster Resilience Pilot 2024.2 emerges as a response to the escalating impacts of climate change and natural disasters, driving the need for advanced solutions that leverage cutting-edge technologies and collaborative approaches. This initiative capitalizes on the rapid advancements in generative AI, Earth Observation (EO) technologies, and geospatial platforms to address gaps in disaster management and climate resilience workflows.
Figure 1 — CDRP2024.2 Pilot - Application Domains & Platforms Ecosystem
Building on established frameworks like the FAIR principles and evolving standards such as those developed by the Open Geospatial Consortium (OGC), the pilot prioritizes interoperability and accessibility across diverse datasets and services. The integration of EO technologies with generative AI offers transformative potential, enabling actionable insights derived from vast and complex geospatial data. Platforms like Copernicus and WEkEO serve as foundational resources, providing the critical data infrastructure required for meaningful analysis and innovation.
1.2. Importance of GenAI in Climate Resilience and Disaster Management
Generative Artificial Intelligence (GenAI) serves as a transformative tool in addressing the challenges posed by climate change and disaster management. Its ability to process, analyze, and generate insights from vast volumes of geospatial and Earth Observation (EO) data provides significant advantages in enhancing resilience and response capabilities.
One of the critical strengths of GenAI is its capacity to bridge data complexity and usability. The integration of spatial and textual information enables decision-makers to derive actionable insights from otherwise fragmented and complex datasets. This approach proves invaluable for predicting the impacts of climate-related hazards, including coastal flooding and wildfires, and for optimizing response strategies.
GenAI fosters stakeholder engagement and accessibility, translating complex data into plain-language insights, tailored recommendations, and intuitive visualizations. These capabilities ensure that vital information reaches not only technical experts but also policymakers, community leaders, and the general public.
Efficiency in data workflows also improves through GenAI, automating processes such as data discovery, analysis, and reporting. Integration with FAIR-compliant platforms such as Copernicus and WEkEO amplifies the potential of existing technologies, ensuring that climate and disaster resilience initiatives remain adaptive and forward-looking.
1.3. Alignment with UN SDGs and Climate Resilience Goals
The Climate and Disaster Resilience Pilot 2024.2 aligns closely with the United Nations Sustainable Development Goals (UN SDGs) and broader climate resilience objectives, reinforcing global efforts to combat the impacts of climate change and promote sustainable development.
The pilot directly supports SDG 13: Climate Action, focusing on improving preparedness and response to climate-related disasters. Emphasis on generative AI and geospatial data integration enhances capabilities for monitoring, predicting, and mitigating climate impacts, contributing to informed policy-making and community resilience.
Efforts also advance SDG 11: Sustainable Cities and Communities, fostering solutions that strengthen urban resilience against disasters. AI-driven tools empower local governments and urban planners to identify vulnerabilities, optimize resource allocation, and improve disaster management strategies.
Support for SDG 17: Partnerships for the Goals reflects the collaborative approach involving governments, private sector entities, academic institutions, and NGOs. This partnership-driven strategy ensures the development of scalable, standards-compliant solutions that address the needs of multiple sectors and regions.
1.4. OGC’s Role in Advancing AI Integration
The Open Geospatial Consortium (OGC) contributes to the integration of artificial intelligence (AI) into geospatial systems and technologies. As a global organization responsible for developing open standards for geospatial data, OGC facilitates interoperability and accessibility, establishing a framework that supports the adoption of AI-driven geospatial technologies in a wide range of applications.
OGC’s work focuses on creating standards that enable AI systems to interact with geospatial platforms effectively, addressing data formats, processing workflows, and service interoperability. These efforts support AI models in utilizing geospatial data for applications such as environmental monitoring, disaster prediction, and urban planning.
2. Challenges in Exploiting GenAI within the Climate Resilience Domain
Generative AI has immense potential for transforming geospatial applications, particularly in areas like climate resilience and disaster management. However, leveraging this potential comes with unique challenges, as GenAI systems often struggle to handle the complexities of geospatial data and domain-specific requirements. The following sections explore critical issues such as geospatial awareness, hallucinations, climate-specific contexts, and integration with geospatial standards and APIs.
2.1. Geospatial Awareness in Generative AI Systems
Generative AI systems face inherent challenges in understanding and referencing geographic locations. Unlike humans, who perceive locations through lived experiences, context, and intuitive spatial reasoning, GenAI’s grasp of geospatial information is limited to the data it is trained on or linked to. This limitation raises several issues critical to the effective application of GenAI in domains requiring geospatial awareness, such as climate resilience or disaster management.
2.1.1. Text-Based Knowledge
The static nature of GenAI’s text-based training data constrains its ability to interpret geospatial context dynamically. For instance, while GenAI might “know” that Athens is the capital of Greece, this knowledge is abstract, disconnected from physical coordinates, and often devoid of real-time or topographical details. The lack of temporal and spatial contextuality means that while GenAI can generate plausible text about a location, it cannot independently verify spatial relationships, boundaries, or proximity without additional data.
2.1.2. Ambiguity in Place Names
GenAI models also struggle with the inherent ambiguity in place names. Many locations share identical names (e.g., Springfield in the United States), and disambiguating them requires more than linguistic understanding—it demands access to hierarchical geospatial data or contextual user input. Without this, responses may be inconsistent or outright incorrect, particularly in areas with lesser-documented geographic features.
2.1.3. Fragmented Geospatial Data
Another challenge lies in the fragmented nature of geospatial data sources. While authoritative platforms such as Copernicus or OpenStreetMap exist, their integration with GenAI systems is not seamless, often requiring manual effort to reconcile differing data standards and formats. This fragmentation exacerbates inconsistencies in GenAI’s understanding of spatial relationships, reducing its utility for complex geospatial tasks.
2.1.4. Knowledge Gaps in Undocumented Areas
GenAI models exhibit clear limitations in understanding remote or less-documented regions. For example, a well-documented city like London may yield detailed and accurate responses, while a lesser-known location such as Eresos, Greece, may only elicit generic or partial information. This disparity can be attributed to uneven representation in training datasets, further complicating the equitable application of GenAI across diverse geographies.
2.1.5. Temporal Changes and Updates
Geospatial knowledge is not static; changes in urban development, climate impacts, or administrative boundaries can render training data obsolete. GenAI models cannot account for such temporal variations unless they are continually retrained with updated datasets or connected to live geospatial information systems. This presents a challenge for applications requiring real-time situational awareness, such as disaster response.
2.2. Hallucinations in the Geospatial Domain
Generative AI systems are prone to hallucinations—instances where they generate inaccurate, inconsistent, or entirely fabricated information. In the geospatial domain, such hallucinations can have particularly significant consequences, given the importance of precision and reliability in geographic and spatial data. These hallucinations often arise due to inherent limitations in the training data, lack of real-world validation mechanisms, and the complexity of spatial relationships.
2.2.1. Causes of Hallucinations
Incomplete or Biased Training Data: GenAI models trained on incomplete datasets may lack comprehensive information about certain locations, leading to fabricated details when queried about these areas. For instance, when asked about a remote village, the system might create plausible but inaccurate descriptions or relationships based on similar but unrelated data.
Ambiguity and Overgeneralization: The ambiguity of geographic names or features can cause hallucinations. For example, querying a GenAI model about “Springfield” could result in a conflation of characteristics from multiple locations with the same name. Also, overgeneralization occurs when the system extrapolates patterns from well-known locations to poorly documented ones, often producing erroneous outputs.
Disconnected Context: GenAI lacks inherent spatial reasoning, which can lead to inconsistencies in understanding geographic hierarchies. For example, it might erroneously describe a city as being within the boundaries of another city or region due to a misunderstanding of administrative divisions.
Temporal Mismatches: The static nature of training data means GenAI may hallucinate when describing locations that have undergone recent changes. For instance, a region affected by natural disasters or urban development might be described based on outdated data.
Fabrication of Nonexistent Features: In the absence of specific information, GenAI may invent geographic features, such as creating fictional landmarks, rivers, or infrastructure, to provide what it perceives as a complete answer.
2.2.2. Implications of Hallucinations
Hallucinated geospatial information can have serious consequences, particularly in critical areas like disaster management, urban planning, and climate resilience. Misleading data can result in ineffective or even harmful strategies, undermining the effectiveness of decision-making. Persistent inaccuracies further erode trust in AI tools, especially in high-stakes applications where precision is essential. Moreover, such errors can propagate across systems if flawed GenAI outputs are used to train other models or update databases, compounding the problem and spreading inaccuracies further.
2.2.3. Understanding the Dynamics of Climate Systems
Climate systems are complex and challenging for Generative AI (GenAI) to understand. They involve feedback loops, non-linear changes, and variations across different locations and time periods. These factors require specialized data and modeling to make accurate predictions, which general AI systems often cannot handle without significant adjustments.
Feedback Loops: GenAI systems trained on static or simplistic datasets may fail to capture the cascading effects of such feedback loops, leading to incomplete or misleading predictions. Climate systems are replete with feedback mechanisms, where outputs of a process loop back to influence the same process. For example:
The melting of polar ice reduces surface albedo (reflectivity), leading to greater heat absorption and accelerated warming.
Vegetation loss increases carbon dioxide levels, exacerbating warming and further vegetation degradation.
Non-Linear Changes: Unlike many domains where relationships are linear, climate systems exhibit non-linear changes. Small variations in one variable, such as temperature, can lead to disproportionate impacts, such as sudden shifts in weather patterns or ecosystem collapses. Capturing and predicting these non-linear dynamics requires specialized modeling approaches and datasets that account for thresholds, tipping points, and chaotic behaviors.
Spatial and Temporal Variability: Climate impacts vary significantly across spatial (local to global) and temporal (short-term to long-term) scales. General-purpose AI systems may not be equipped to address this multi-scale variability effectively without customization.
Translating general AI capabilities to work effectively in climate resilience is not straightforward and comes with several challenges. AI models need significant customization to handle complex climate datasets like satellite imagery and weather models, often requiring specialized preprocessing. Collaboration with climate experts is essential to incorporate critical insights about unique variables, such as atmospheric or oceanic dynamics. Additionally, integrating AI with established climate models, like General Circulation Models (GCMs), is necessary but challenging due to their specialized assumptions. Lastly, the vast scale and complexity of climate data demand high computational resources, making effective adaptation both technically and resource-intensive.
Customization for Climate Data: General AI models must be extensively retrained or fine-tuned on climate-specific datasets, such as satellite imagery, weather models, and hydrological simulations. These datasets are often complex, multidimensional, and require domain-specific preprocessing steps, including reformatting, resolution alignment, and noise reduction.
Domain Expertise Requirements: Successful application of AI in climate resilience necessitates collaboration with climate scientists, hydrologists, ecologists, and other domain experts. These experts provide critical insights to ensure AI systems account for the unique variables and relationships in climate data, such as atmospheric circulation patterns or ocean heat dynamics.
Integration with Existing Models: Climate-specific AI applications often need to interface with established climate models, such as General Circulation Models (GCMs) or Regional Climate Models (RCMs). These models operate on specialized assumptions and parameters that general-purpose AI may not natively understand.
Data and Computational Complexity: Climate data is often vast in volume and requires intensive computational resources for analysis and simulation. Translating general AI capabilities to climate-specific contexts involves addressing these scale and resource challenges.
2.3. Aligning GenAI with Geospatial Standards
The integration of Generative AI (GenAI) with geospatial standards represents both an opportunity and a challenge in the climate resilience domain. Geospatial standards are essential for ensuring interoperability, data quality, and consistency across platforms and datasets, while GenAI relies on these structured inputs to generate meaningful outputs. Misalignment between GenAI capabilities and geospatial standards can limit the potential of AI-driven solutions in geospatial applications.
Complexity of Geospatial Data: Geospatial data often involves multiple layers, formats, and coordinate systems. GenAI systems may struggle to interpret or align these data layers without adherence to standards. For example, integrating raster data from satellite imagery with vector data from cadastral maps requires strict compliance with projection and resolution standards.
Lack of AI-Specific Standards: Existing geospatial standards are not explicitly designed for AI workflows, leading to gaps in how data is prepared, annotated, or served to GenAI systems. This lack of alignment can reduce the efficiency of AI applications in geospatial analysis and decision-making.
Variability Across Platforms: Different geospatial platforms implement standards in varying ways, creating interoperability challenges for GenAI. For example, the way WMS is implemented may differ slightly between providers, complicating its integration with AI workflows.
Metadata Inconsistencies: While standards like ISO 19115 exist, metadata practices are not always consistent, especially for legacy datasets. GenAI relies heavily on well-structured metadata to contextualize geospatial information.
Dynamic Data Requirements: Real-time or near-real-time geospatial data, such as weather updates or sensor networks, often operate outside traditional standards, creating integration challenges for GenAI systems that require immediate and standardized inputs.
2.4. Integration with Application Programming Interfaces (APIs)
Connecting Generative AI systems with Application Programming Interfaces (APIs) makes them much more powerful. APIs allow GenAI to get real-time data, perform analyses, and provide accurate and relevant answers. However, working with APIs also brings some challenges that need to be carefully managed.
Rate Limits and Access Restrictions: Many APIs limit how often they can be used or how much data can be requested at a time. This can slow down GenAI in situations where fast responses are needed, like tracking disasters or weather changes. Some APIs charge fees or require subscriptions, which can be too expensive for smaller teams or organizations.
Latency and Performance: Getting data from an API takes time, especially if the data is large or if there are multiple requests. This delay can affect how quickly GenAI can provide answers, which is a problem for real-time tasks like navigation or emergency planning. Improving speed requires careful system design, but it can be tricky to balance performance and complexity.
Data Privacy and Security: APIs often handle sensitive data, so it’s important to keep this information safe. If security measures like encryption or user authentication aren’t properly set up, there’s a risk of data leaks or unauthorized access. Setting up and maintaining these protections adds extra work and complexity to the system.
Dependency Management: Relying on APIs means depending on third-party services. If an API changes, limits access, or stops working, it can disrupt how GenAI functions. To avoid problems, systems need to be flexible enough to handle changes or switch to alternative APIs when needed.
3. Analysis Ready Data (ARD) Maturity Report (D010)
The Analysis Ready Data (ARD) Maturity Report (link) evaluates the maturity of crucial ARD sources, focusing on NOAA datasets for disaster risk response and climate assessments. The report reviews three major data maturity models and develops an updated Data Maturity Matrix to enhance data quality, accessibility, and interoperability. It also identifies tools for automating data maturity evaluation and advancing ARD readiness.
3.1. Data Maturity Framework
The report integrates three existing data maturity assessment models:
Data Stewardship Maturity Matrix (DSMM) – Focuses on data stewardship best practices.
CEOS Analysis Ready Data (CEOS-ARD) – Ensures satellite data is pre-processed and analysis-ready.
WGISS Data Management and Stewardship Maturity Matrix (DMSMM) – Provides a structured approach to Earth Observation (EO) data stewardship.
An updated Data Maturity Matrix is proposed, categorizing datasets into four levels:
L0 (Not Managed) – No formal management practices.
L1 (Partially Managed) – Basic metadata and accessibility with limited quality assurance.
L2 (Managed) – Standardized, well-documented, and quality-controlled data.
L3 (Fully Managed) – Optimized, validated, and FAIR-compliant data.
3.2. Tools for Self-Service Assessments
To support automated and self-service ARD maturity evaluations, the report highlights key tools, including:
Data Maturity Assessment Templates (e.g., DSMM Model Template, CEOS ARD Self-Assessment Guide)
Compliance Test Tools (e.g., OGC Compliance Test Suites, CF-Checker, Geospatial Metadata Validation Service)
The report emphasizes the need for a comprehensive suite of automated tools to streamline ARD maturity assessments. Future developments should focus on:
Enhancing FAIRness and AI readiness in data.
Improving metadata indexing and interoperability for disaster response applications.
Strengthening integration with cloud-native storage and high-performance computing frameworks.
4. Generative AI for Wildfire Report (D030)
This engineering report D030 (link) builds upon the findings of Phase 1 (D-123) from OGC Disaster and Climate Resilience Pilot III. The primary focus is on advancing GenAI applications for wildfire risk analysis, social impact, emergency response as related to wildland fire insurance workflows, specifically in the Canadian context. The Wildland Fire (WF) community depends on robust data insights and advanced tools to bolster planning and operational decisions—augmented rather than replaced by the experiential knowledge of stakeholders.
4.1. Use Cases and Functionalities
This report outlines key GenAI-driven use cases relevant to wildfire resilience, response, and risk assessment. This report centers on leveraging GenAI to strengthen wildfire insurance and preparedness efforts in Canada, addressing social impact, operational efficiency, and business resilience. Specifically, the use case focus, and needed data focuses on Helping People and Business Management as it relates to Wildland Fire Insurance Stakeholders.
4.2. Data Sources and FAIR Evaluation
Phase 2 includes an inventory of over 200 Canadian wildfire-related data sources categorized in data subject areas of Wildland Fire National Strategy & Management, National Base Data Layer Information, and Risk Indicators, Analysis, and Assessment which would be needed for GenAI Training data.
4.3. OGC Compliance and Interoperability
This report relies on the Phase 1 report basis which aligns with OGC best practices, ensuring cross-agency data integration and AI model transparency including references to OGC APIs and Data Standards, Metadata and Traceability, and AI Model Governance.
4.4. Findings and Recommendations
This report focuses on By focusing on Phase 2 priorities, combined with Phase 1 inputs, which provides a forward-looking roadmap for GenAI adoption in wildfire resilience and risk management including consideration of the following:
Key Wildland Fire Business Objectives for Canadian Insurance Sector: Insurers lack granular, AI-driven wildfire risk assessment tools . Current models would benefit to leverage high-resolution geospatial and ecosystems datasets for social impact and business management. Develop GenAI-powered wildfire risk models that integrate geospatial, fuels, topography, weather, historical fire data, and predictive analytics for improved underwriting and risk-based insurance pricing.
Data Needs: Generative AI requires domain-specific, structured, and unstructured wildfire datasets to enhance predictive accuracy. Over 200 Canadian wildfire-related datasets were identified, categorized, and assessed for AI readiness. Establish continuous training and labeled data improvement lifecycle to refine AI models, ensuring real-time API integrations where necessary.
Mapping Use Cases to Dataset Readiness and Priority: Data gaps in Canadian wildfire analytics exist , particularly in structure materials/fuels, fuel moisture levels, and community vulnerability metrics . Prioritize high-impact AI use cases (e.g., Community Risk & Resilience Assessment, Grant & Funding Strategy Development, and Asset Risk Reduction & Loss Prevention ) by expanding integration with national datasets and real-time wildfire data sources.
Gen AI WF Capabilities to Support Use Cases: AI language models (LLMs) struggle with contextual wildfire decision-making without enhanced domain adaptation, retrieval-augmented generation (RAG), and multi-modal AI . Implement RAG and knowledge graph-based AI architectures to improve wildfire intelligence extraction, risk communication, and operational decision-making .
GenAI Roadmap Recommendations for Wildfire Insurance: Regulatory frameworks and AI governance policies are not well established for Generative AI in wildfire insurance and risk assessment. Align AI model development with OGC interoperability standards to ensure data provenance, auditability, and regulatory compliance . Adopt OGC Training Data Markup Language (TDML-AI) to ensure traceability, validation, and ethical AI deployment in wildfire analytics.
Findings on Stakeholder Engagement: AI adoption in wildfire management is hindered by organizational awareness, data silos and in cases cultural resistance . Establish cross-sector collaboration between wildfire agencies, insurance companies, and AI developers to accelerate GenAI adoption. AI pilot projects are essential for proving the effectiveness of wildfire AI applications. Conduct targeted AI prototype testing (e.g., Wildland Fire Customer Awareness tool, Predictive Risk Dashboard, and Claims Automation System ) with measurable success metrics.
5. Information Interoperability Report (D040)
The Information Interoperability Engineering Report (link) explores methods for enhancing interoperability between different geospatial and environmental data retrieval systems. The report focuses on aligning the OGC Environmental Data Retrieval (EDR) API with the Common Core Ontology (CCO) to improve information exchange across various domains such as emergency response, environmental monitoring, and defense applications.
5.1. Objective and Approach
A key objective of the report is to resolve semantic and syntactic heterogeneities in geospatial data. While the OGC EDR API provides structured environmental data queries, CCO offers a formal ontology that standardizes domain-agnostic concepts, including entities, events, and roles. The report proposes a shared ontology framework to establish a common vocabulary that facilitates seamless data integration. This framework is developed using OWL/RDF, ensuring that equivalent, broader, or narrower concepts from both representations are mapped correctly.
5.2. Mapping Core Concepts
The report outlines a methodology for achieving interoperability by identifying and mapping core concepts between OGC EDR and CCO. For example, edr:Time is mapped to cco:TimeInterval, while edr:Geometry corresponds to cco:SpatialRegion. These mappings allow geospatial and environmental data from different sources to be queried and interpreted in a unified manner. Additionally, bridge classes are introduced where direct mappings do not exist, ensuring a flexible and scalable approach.
5.3. Use Cases and Practical Applications
Several use cases are presented to demonstrate the real-world benefits of this interoperability framework.
Semantic Query for Flood-Prone Areas: Integrates flood-prone area data from OGC EDR API with hydrographic features in CCO.
Retrieval of Weather Data: Queries temperature at a specific location and time using mapped ontology concepts.
Average Rainfall Over a Region: Retrieves average rainfall for a given spatial extent over a specific period.
The report suggests further refinement of ontology mappings, enhanced SHACL-based validation rules, and broader adoption of this interoperability framework in various geospatial applications. By addressing interoperability challenges, the proposed approach aims to improve data discoverability, query efficiency, and cross-domain collaboration in geospatial information systems.
6. Generative AI Virtual Assistants: Design and Implementation
GenAI assistants for the geospatial domain can leverage Retrieval-Augmented Generation (RAG) to combine real-time data retrieval with generative capabilities, ensuring that insights are both accurate and contextually relevant. By combining RAG architecture with training and data strategies, the GenAI Virtual Assistants can deliver high-value, actionable insights across domains in climate resilience and disaster management. The section outlines the objectives, use cases, design architecture, and data strategies that underpin the pilot implementations.
6.1. Objectives and Use Cases for Virtual Assistants
The primary objective of GenAI Virtual Assistants is to bridge the gap between complex geospatial datasets and actionable insights for diverse stakeholders, including urban planners, disaster response teams, and policymakers. The use cases focus on enabling intuitive interactions with data and delivering decision-ready insights through plain-language explanations.
Data Discovery and Accessibility: Assisting users in locating relevant geospatial datasets across platforms like Copernicus Climate Change Service (C3S) and WEkEO.
Real-Time Disaster Monitoring: Providing up-to-date insights on hazards such as floods, wildfires, and hurricanes by integrating live data streams.
Policy Support: Offering contextual recommendations for urban planning, risk mitigation, and resilience strategies based on current geospatial data.
Community Engagement: Enhancing public awareness by providing easily understandable summaries and visualizations of complex environmental data.
6.2. Design and Architecture of Demonstrators
The demonstrators can utilize a RAG-based architecture to enhance performance, reliability, and context-awareness in AI outputs. Key components of the architecture include:
Data Retrieval Layer: Integrates APIs, live data streams, and geospatial repositories for real-time information gathering.
Generative Model Core: Utilizes Large Language Models (LLMs) fine-tuned on domain-specific datasets to generate insights and explanations.
Knowledge Graph Integration: Enriches generative outputs with structured geospatial ontologies to ensure spatial and contextual accuracy.
Validation Pipeline: Implements feedback loops, confidence scoring, and expert validation to reduce hallucinations and improve output reliability.
User Interface: Provides an interactive, user-friendly platform that supports text-based queries, visualizations, and contextual recommendations.
6.3. Training Data: Types, Sources, and Preprocessing
The effectiveness of Virtual Assistants depends on the quality and diversity of training data. Data preprocessing and curation ensure that inputs are both representative and FAIR-compliant. Key considerations include:
Data Types: Incorporate geospatial datasets (raster and vector data), climate records, sensor network outputs, and socio-economic indicators.
Data Sources: Draw from authoritative platforms such as C3S, WEkEO, OpenStreetMap, and NOAA, along with community-contributed datasets.
Preprocessing Techniques, such as:
Data Cleaning: Remove inconsistencies, outliers, and noise to improve data quality.
Normalization: Align data formats, projections, and coordinate systems for interoperability.
Data Augmentation: Generate synthetic datasets to simulate disaster scenarios and address data gaps in underrepresented regions.
Metadata Enhancement: Ensure datasets include sufficient metadata for contextualization and adherence to FAIR principles.
Model Fine-Tuning: Tailor pre-trained LLMs with domain-specific geospatial and climate data to improve relevance and accuracy.
6.4. Validation Practices Against Hallucinations
As discussed, GenAI systems can sometimes produce hallucinations—outputs that are inaccurate, inconsistent, or entirely fabricated. This is especially critical in geospatial and climate-related applications, where precision and reliability are paramount. The following practices can help validate AI outputs and minimize the risk of hallucinations.
Cross-Referencing with Authoritative Data: Compare AI-generated outputs with verified datasets or authoritative sources, such as government geospatial databases, OpenStreetMap (OSM), or satellite imagery. For example, you can compare a bounding box generated by GenAI against trusted geographic platforms like Google Maps or Copernicus.
Ground-Truth Datasets: Use curated and validated datasets as training and testing benchmarks to minimize inaccuracies during model development. Train models with datasets from established organizations like NOAA or ESA for accurate climate predictions.
Spatial Consistency Checks: Verify spatial relationships in AI outputs to ensure they align with known geographic rules and structures. For instance, check that cities are not placed in oceans and that bounding boxes do not overlap invalid regions.
Confidence Scoring: Require AI models to assign confidence scores to their outputs, indicating the certainty of their predictions. Low-confidence results can then be reviewed more carefully before use.
Human-in-the-Loop Validation: Incorporate domain experts or users into the validation process to review and correct AI outputs. For example, emergency planners can verify AI-generated evacuation routes against real-world conditions during disaster management.
Feedback Loops: Implement mechanisms for users to provide feedback on errors or inaccuracies in outputs, enabling iterative improvement. Allow corrections to AI-generated maps or bounding boxes, and retrain the model with updated data.
Multimodal Validation: Use multiple data types (e.g., textual descriptions, satellite images, GIS layers) to cross-validate AI outputs. For example, flood predictions can be verified using elevation data and past flood records.
Consistency with Domain Knowledge: Ensure that outputs align with known domain-specific principles and constraints. Validate climate-related predictions against meteorological principles, such as seasonal trends or geographic climate zones.
Synthetic Data Testing: Test the model with synthetic or simulated scenarios to assess its performance in edge cases. For instance, create hypothetical locations and evaluate whether the AI can distinguish plausible outputs from impossible ones.
Regular Model Updates: Periodically retrain models with updated data to account for temporal changes, such as new developments or natural disasters. Reflect recent urban expansions or environmental changes by incorporating updated satellite imagery.
Chain-of-Thought Reasoning: Integrate Chain-of-Thought (CoT) Reasoning to guide AI through step-by-step logical processes, improving its ability to reason accurately in geospatial and climate-related queries. This method involves breaking down complex problems into a sequence of intermediate reasoning steps, allowing the AI to explicitly justify each decision before producing an output.
7. Case Studies and Prototypes
7.1. Demonstrator for Virtual AI Assistants (D100 — GeoLabs)
The D100 demonstrator chatbot can be accessed online at https://cdrp.geolabs.fr:8503 and the presentation portal is available at https://geolabs.github.io/CDRP/D100/. We present three demo illustrations in the appendix Annex B.
7.1.1. Use Cases and Functionalities
The D100 virtual assistant enhances the accessibility and usability of various geospatial and climate-related data sources. It serves as an interactive tool that enables users to efficiently search, retrieve, and explore relevant datasets. The assistant’s core functionalities include:
Data Discovery & Search: Users can query the assistant to find relevant datasets from multiple data sources, improving data findability.
Retrieval-Augmented Generation (RAG): The chatbot provides responses by fetching and indexing relevant documents, ensuring context-aware answers.
Web Search-Based Responses: The assistant is capable of conducting a web search to fetch information from online sources.
LLM Model Selection: Users can choose from various large language models to generate responses, optimizing results based on specific use cases.
FAIR Data Principles Compliance: Developed based on aligning with findable, accessible, interoperable, and reusable (FAIR) principles.
OGC Standard Integration: Designed with an aim to ensure compliance and interoperability with geospatial data processing workflows and existing OGC API implementations.
7.1.2. Data Sources and FAIR Evaluation
The virtual assistant is developed using multiple data sources and evaluated based on the FAIR principles—ensuring that data is Findable, Accessible, Interoperable, and Reproducible.
7.1.2.1. Data Sources
The assistant leverages the following key data sources:
Copernicus Data Sources
ECMWF Climate Data Store (CDS)
Green Deal Data Space
Copernicus Atmosphere Monitoring Service (CAMS)
WEkEO Environmental Data Hub
7.1.2.2. Evaluation
The assistant is developed using publicly available data sources indexed in a structured database. It is built using free and open-source tools and powered by Large Language Models (LLMs). The system prioritizes fairness, accessibility, interoperability, and reproducibility to ensure a robust and ethical AI framework.
7.1.3. OGC Compliance and Interoperability
The assistant builds upon previous efforts in OGC initiatives, including:
OGC CDRP.1 (Landslide Demonstrator)
OGC Testbed 19 — Machine Learning Activity (Inferencing-as-a-Service)
These efforts explored integrating machine learning (ML) inferencing as OGC API Processes, particularly for landslide detection. The workflows were developed using the OGC Training Data Markup Language (TrainingDML) and OGC API Processes (Parts 1 and 2). The virtual assistant acts as a user-interaction layer for these implementations, providing an end-to-end Generative AI stack for scientific applications while ensuring seamless interoperability.
7.1.4. Findings and Recommendations
The D100 virtual assistant improves data search and provides an interactive way to explore the datasets mentioned above.
The figure below illustrates the high-level workflow of the assistant:
Figure 2 — D100 (GeoLabs) - High Level Workflow
Users interact with the assistant through a chatbot interface, where queries are processed using one of the following approaches:
Approach 1: Retrieval-Augmented Generation (RAG) using a Local Database Fetch data from various sources Index the data in a Chroma vector database Match stored documents with the user query Retrieve the top 5 matching documents ** Generate a response using an LLM based on the retrieved data
Approach 2: Web Search-Based Responses Conduct a web search for the user’s query Fetch relevant data and index it in a temporary vector database Match the query with indexed documents Retrieve the top 5 documents ** Generate a response using an LLM based on the retrieved data
Users can select from the following LLM models for response generation:
Key Learnings:
Data indexed in markdown format improves LLM processing and comprehension. In future, standardization for LLM friendly input and output data formats can be researched.
High-quality data cleaning and indexing enhance search efficiency and response accuracy.
The BM25 algorithm efficiently re-ranks search results for improved document retrieval.
GPU-powered LLM inferencing significantly reduces response generation time.
Effective prompt engineering on user queries enhances query relevance and response robustness.
7.1.5. Future Directions
To further enhance the assistant, we plan to:
Evaluate Chain of Thought (CoT) reasoning for improved query understanding.
Assess and mitigate hallucinations in generated responses.
Implement dynamic validation mechanisms to ensure response reliability.
Refine guardrails for LLM responses for safe and reliable output generation.
Extend support for additional OGC standards, including:
Training Data Markup Language (TDML)
OGC API Processes Part 1: Core
OGC API — Processes — Part 2: Deploy, Replace, Undeploy
Investigate the feasibility of integrating multi-modal capabilities, such as image and satellite data processing.
Enhance documentation and user guidance to improve adoption and usability of the virtual assistant.
7.2. Demonstrator for Virtual AI Assistants (D110 — CCMEO):
The proposed use case for embedding a generative AI chatbot on GEO.ca focuses on enhancing the search and discovery of FAIR and Open geospatial data through conversational AI. Although CCMEO was assigned to support this deliverable, we committed only to providing in-kind contributions rather than developing a full solution. Since then, however, prototype development has been initiated to explore the feasibility of this solution. It aims to support diverse user groups such as researchers, policymakers, educators, and citizens by providing intuitive natural language query capabilities and personalized data discovery features. The plan includes functionalities like contextual search, integration with GEO.ca’s APIs, and enhanced user interaction through visualization tools. Since the proposal, development of a prototype has begun, including efforts to experiment with OGC GeoPackage visualization on a map client. These activities align with Canada’s commitments to accessibility, reconciliation, and compliance with geospatial standards.
7.2.1. Use Cases and Functionalities
The generative AI chatbot use case on GEO.ca is designed to enhance the search and discovery of FAIR and Open geospatial data through a conversational interface. It supports various user scenarios, including researchers seeking climate change impact data for analysis, policymakers needing land use data to inform urban planning, educators accessing open data for classroom demonstrations, and citizens viewing historical temperature trends to understand and advocate for climate action.
The chatbot’s primary functionalities include natural language query processing, contextual, geospatial data-driven search, personalized responses based on user needs, and integration with existing GEO.ca APIs for seamless access to data.
Additionally, outside of the provided use case, we explored data visualization capabilities in a subsequent ongoing proof-of-concept development. This involved investigating how the LLM could be augmented with tools to support visualizing data resources (e.g., GeoPackage) associated with the generated results.
7.2.2. Data Sources and FAIR Evaluation
The AI chatbot leverages data repositories from GEO.ca, adhering to FAIR principles. The data is made findable through clear metadata and search functionalities that enhance discoverability. Accessibility is ensured by providing open access to data via CCMEO’s APIs and services. The data is interoperable, formatted to integrate with various GIS tools following OGC standards. Additionally, datasets are reusable as they comply with licensing and metadata standards such as ISO-19115.
The chatbot intends to further promote these principles by offering guided assistance on accessing and interpreting data. When search results are returned within the chatbot, more information about each dataset is displayed, including links to associated data and metadata resources. Contextual record searching ensures that when users request more details about a specific result, the LLM focuses solely on the metadata for that record. This focused approach helps to streamline user interactions by presenting only relevant information for a specific record, reducing confusion caused by unrelated search results and reinforcing the authoritativeness of the information and data returned.
7.2.3. OGC Compliance and Interoperability
The planned integration of the chatbot aims to align with OGC (Open Geospatial Consortium) standards to support key objectives. First, interoperability is prioritized by ensuring that the chatbot interfaces with GEO.ca’s systems using OGC-compliant APIs, thereby adhering to the Canadian Geospatial Data Infrastructure (CGDI) and Canada’s Standard on Geospatial. Additionally, the chatbot is designed to leverage OGC standards, including OGC API — Features and OGC API — Records, to handle geospatial queries effectively. As part of the development, we have also begun experimenting with the GeoPackage standard to improve data handling and visualization capabilities. Although the solution remains in the prototyping stage, ongoing efforts are focused on refining these functionalities. Finally, accessibility and inclusivity are central considerations, with efforts to train the chatbot’s NLP on multilingual and region-specific geospatial terms, respecting Canada’s official languages, commitments to reconciliation with Indigenous Peoples, and the goals outlined in the Accessibility Action Plan.
7.2.4. Findings and Recommendations
Findings: The chatbot was not fully implemented, so there are no direct findings from its deployment on GEO.ca. However, experimentation with a generative AI chatbot currently under development revealed several key challenges. These include hallucinations, where responses contain information not based on authoritative sources, and low confidence in results, particularly when data is not retrieved from trusted APIs like those provided by CCMEO. Additionally, infrastructure costs for training, fine-tuning, and operating such models pose a significant barrier, particularly during initial deployment and whenever new data is added to the catalogue or data repositories.
Recommendations: To address these challenges, responses generated by the chatbot prototype should clearly indicate when information is not sourced from authoritative APIs, particularly those not provided by CCMEO. Ongoing research and experimentation will focus on mitigation of hallucinations and improvements to model reliability. Optimizing the infrastructure for scalability and cost-efficiency is necessary to ensure the sustainability of the system as usage grows, especially with periodic updates to data catalogs and repositories. Additionally, pilot testing will help validate the prototype’s performance and provide opportunities to gather user feedback before moving toward full implementation.
7.3. Demonstrator for Virtual AI Assistants (D110 – Danti)
Danti’s demonstrator (https://gov.danti.ai/log-in contact gov-suppot@danti.ai for account approval) focuses on leveraging generative AI and spatial intelligence to improve data discovery, accessibility, and analysis. The objective is to develop a knowledge engine that enables users of all skill levels to interact with multimodal geospatial data. Dante.ai integrates large language models (LLMs), geospatial analytics, and real-time data sources to provide actionable insights for decision-makers. The system is designed to support both government and commercial sectors, offering enhanced capabilities for geospatial intelligence, disaster response, and climate risk assessment.
Figure 3 — D110 (Danti) - Logical Architecture Diagram
7.3.1. Use Cases and Functionalities
The demonstrator showcases its capabilities through real-world geospatial analysis scenarios. Key functionalities include:
Intelligent Data Discovery: Users can search for locations (e.g., wildfire-prone areas in California) and receive AI-generated summaries, relevant datasets, and geospatial insights.
Multi-Source Integration: The system aggregates data from government, commercial, and social media sources, including satellite imagery, climate data, and news feeds.
Geospatial Awareness & Querying: AI models enable context-aware retrieval of geospatial data, refining **search results based on location, time, and data relevance.
Decision-Support Analytics: Provides visual representations, mapping insights, and predictive analytics to support policy-making and emergency response.
Automated Alerts & Monitoring: Users can save searches and receive notifications when new data is available for their areas of interest.
7.3.2. Data Sources and FAIR Evaluation
The demonstrator integrates various data sources, including:
Government Imagery: Optical, SAR, and fire event data (e.g., NOAA, NASA, NGA).
Commercial Earth Observation Data: Providers such as Planet Labs.
Social Media & News Data: Real-time event tracking from open sources.
7.3.3. OGC Compliance and Interoperability
The Danti demonstrator currently indexes multiple OGC and NSG compliant data formats including GeoTIFF and NTF. Future plans include investigating the implementation of OGC API-Records to streamline and standardize metadata management, as well as OGC-API Processes to enable cross-platform analytical capabilities.
7.3.4. Findings and Recommendations
The demonstrator evaluated different methods for enhancing AI-driven geospatial awareness in LLMs, including:
Retrieval-Augmented Generation (RAG): AI-enhanced information retrieval from multiple sources.
Knowledge Graph Integration: AI-driven connections between related datasets and geospatial queries.
AI-Powered Query Generation: Allowing LLMs to generate structured queries for geospatial data retrieval.
Key findings indicate that integrating AI-driven search with knowledge graphs significantly improves geospatial data interpretation and usability.
7.4. Demonstrator for Virtual AI Assistants (D120 — CRIM):
CRIM’s demonstrator (https://ogc-demo.crim.ca/) leverages generative AI to enable a virtual assistant capable of interacting with geospatial data, using maps to drive conversations and provide insights, moving away from structured APIs to more intuitive map-based queries. A prototype was developed to test this concept, focusing on flood risk maps as a use case.
7.4.1. Use Cases and Functionalities
The demonstrator explores scenarios where AI assists users in interpreting flood risk maps and other geospatial overlays. Users can interact with the system by entering prompts such as specific addresses to determine flood risks. Functionalities include:
Querying geospatial data overlays to answer location-specific risk questions.
Adjusting map opacity to enhance model explanations.
Intercepting queries for custom tools, such as geocoding locations and retrieving related map images.
Challenges include improving model reliability and minimizing hallucinations, such as misinterpreting map features or geographic contexts.
7.4.2. Data collection tool
The Canadian flooding imagery data collector allows extraction of data by coordinates and zoom level. These data are extracted from the Flood Susceptibility Index 2015 map layer hosted on geo.ca then overlayed on the street map layer from ArcGISOnline with a specified opacity level.
Figure 4 — D120 (CRIM) - Street Map
Figure 5 — D120 (CRIM) - Flooding Map
Figure 6 — D120 (CRIM) - Combined Map
Street map: This is the base image accessed from the coordinates provided. Because the street map is made from indexed tiles, the tool must determine which tile is needed to construct a landscape user-friendly view. Based on the tile from which the coordinate is found in, it will form a 2:1 ratio image. Since each tile is 256×256 pixels it forms a 1024×512 pixels image, this format is important for the Flood image. Finaly a pre-determined zoom is set to have enough information in the image.
Flooding susceptibility image: This image is extracted from Geo.ca flooding-susceptibility map based on a coordinate, zoom and the image width and height in pixels, which is the values of the previous street map.
Combine overlapped image: This is the last step in which the flooding susceptibility image is overlapped on the street map with a specify opacity, a balance between losing saturation of the image and the readability of street and neighborhood names.
Central to this tool is the Processing Pipeline. This pipeline consists of a series of operations that transform user input into actionable data. Within this pipeline, four key tasks are performed:
Identifying the location from the user message.
Geocoding the identified location.
Fetching flood imagery corresponding to the location.
Analyzing the flood risk based on the gathered data.
These steps are illustrated in the following architecture diagram.
Figure 7 — D120 (CRIM) - Architecture Diagram
The Location Processing begins by finding the location within the user message using a first LLM call. Once identified, the address is fed into the Geocoding Service using the Nominatim Geocoder, which converts the address into precise geographic coordinates. Following geocoding, the Flood Image Retrieval step leverages a Geo Flooding Data Processor. This component accesses flood susceptibility imagery based on the coordinates obtained, providing visual data that is crucial for the analysis. The retrieved flood image is then added as context to the original user prompt. Grounding the response generation on this image helps limit hallucinations and allows the user to verify the claims made. This analysis yields insights into flood risks, considering the specific conditions of the user’s location. The result is communicated back to the user, enhancing their understanding of potential flood threats in their area.
An example of the use of this agent is shown in the figure below.