Cloud Storage, Data Loaders & Systems
Azure Data Lake
The Azure Data Lake is the primary location where all data is stored prior to ingesting it into the ODGA’s data warehouses or data marts. The Azure Data Lake will also serve as the storage location for non-structured data (such as images, videos, social media data, etc.) to be used in research, analytics, or reporting. Each Data Trust Member's data will be separated into their own storage blob with the appropriate data governance policies applied.
Azure Data Catalog (Metadata)
The Azure Data Catalog is the system in which metadata is stored for all data sets ingested by the ODGA. This catalog is constantly kept up to date to ensure an accurate representation of the data stored in the ODGA’s systems. This catalog can be shared with potential users to assist them with determining the relevant data sets and fields needed for their work.
The ODGA uses many technical applications and techniques to load data from the data lake to the data warehouses or Azure SQL servers. Custom pipelines have been developed for each data set received in order to cleanse, curate, format, and ingest the data. Some of the technologies used to build pipelines are Python, ADF (Azure Data Factory), C#, SSIS, PowerShell, etc.
Internal ODGA Data Systems
Azure Data Warehouse / Azure SQL Databases
The ODGA’s data warehouses and Azure SQL servers are internal database systems where the source data from Data Trust Members is stored and managed. Only the ODGA will have access to these database systems, all other users will access data through data marts. These systems will store the raw data (PII/PHI data included) that will serve as the single source of truth. Each agency will have their own Azure Data Warehouse which will allow the appropriate data governance polices to be applied. Another benefit to the separated data warehouses is that each agency’s instance(s) can be scaled appropriately and effectively track the cost of that resource.