Link Search Menu Expand Document

Upgrade Guide for converting Metadata from GBL 1.0 to OGM Aardvark


Table of contents

  1. What are the differences between GBL 1.0 and OGM Aardvark?
    1. New elements for rights
    2. New elements for item relations
    3. Consistent namespaces for all metadata element URIs
    4. Multivalued elements whenever possible
  2. CROSSWALKS
    1. Full Aardvark and GBL 1.0 crosswalk table
    2. Elements without a crosswalk
  3. Tools and techniques for upgrading
    1. Option 1: Re-run the metadata pipeline with a new crosswalk
    2. Option 2: Programmatically convert the JSON files
    3. Option 3: Conversion with manual remediation

Terminology used on this page

  • GBL 1.0: The legacy metadata schema designed for GeoBlacklight versions 2.0-3.7. The schema is documented on this Legacy page.
  • OGM Aardvark: The new metadata schema that is compatible with GeoBlacklight version 4.0.
  • GeoBlacklight: When spelled out, GeoBlacklight refers to the application itself, not its namesake legacy metadata schema, GBL 1.0.
  • URI: This is the name we give to the metadata element itself. For example, the URI for the Subject field is dct_subject_sm.
  • Namespace: This is how we signify which family of standards or schemas an element belongs to. For the GeoBlacklight schema, this takes the form of the URI’s prefix. For the URI dct_subject_sm, dct_ is the prefix and signifies that this element is from Dublin Core.
  • Solr field type: This is the suffix appended to the URI and indicates what kind of Solr field should be indexed. For dct_subject_sm, the _sm stands for String Multiple. It indicates that the field type is a string and that it can have multiple values.
  • Value: This is the information that is entered in a field. It may be free text (literal value) or a URI/code (nonliteral value).

What are the differences between GBL 1.0 and OGM Aardvark?

New elements for rights

The new set of rights elements are:

Label URI Description and Entry Guidelines
Access Rights dct_accessRights_s One of two possible values, “Public” or “Restricted”; controls whether a user can preview or download an item. This element replaces dc_rights_s.
Rights dct_rights_sm Free-text field for generic, catch-all access and usage rights. Can include clickable links.
License dct_license_sm Field for one or more URIs. Recommended sources are Creative Commons or Open Data Commons.
Rights Holder dct_rightsHolder_sm Free-text field for the person or organization owning or managing rights over the resource.

New elements for item relations

The new schema includes seven relationship fields. The value for each field should be the ID (slug) of the related item.

GeoBlacklight version 3.4 and earlier has an Item Relations widget that displays items identified in the Source field. Beginning with version 4, this has been updated to use the same widget for each of these fields

The new set of relationship elements are:

Label URI Description and Entry Guidelines
Source dct_source_sm For items that have been derived from another item (e.g. a digitized shapefile from a historical map).
Is Part Of dct_isPartOf_sm For items that are a subset of another item (e.g. a page in a book). This value type is changing from free-text in Version 1.0 to an ID (slug) in the new schema.
Member Of pcdm_memberOf_sm For items in a collection.
Replaces dct_replaces_sm To refer to an item that has been deprecated.
Is Replaced By dct_isReplacedBy_sm To point the user to a new item.
Version dct_isVersionOf_sm To indicate that an item is part of a series of resources that are updated or altered.
Relation dct_relation_sm For a general purpose relation.

Consistent namespaces for all metadata element URIs

OGM Aardvark gives preference to elements found in established schemas over custom fields.

  • dct_: This signifies that the field is part of the Dublin Core Metadata Initiative (DCMI) Metadata Terms. Any Dublin Core fields from GBL 1.0 were updated to use the dct_ namespace, instead of dc_.
  • dcat_: This signifies that the field is from the Data Catalog Vocabulary (DCAT) Version 2.
  • pcdm_: This refers to the Portland Common Data Model, which is a framework for many digital repository systems. We drew from it to establish one of the item relationship fields.
  • gbl_: This stands for GeoBlacklight and is used for any field that is application-specific or has no analogous term in other schemas.

Multivalued elements whenever possible

The original schema features several descriptive metadata fields that only accept one value. The new schema expands many of these to multiple. This changes the URI suffix from _s to _sm. Although it will not affect the GeoBlacklight functionality, this practice may conflict with indexing, as Solr will treat dct_publisher_s as a different field than dct_publisher_sm.


CROSSWALKS

Full Aardvark and GBL 1.0 crosswalk table

The following chart shows the full Aardvark schema and which GBL 1.0 fields can be directly mapped.

Aardvark Label OGM Aardvark GBL 1.0 Note
Access Rights dct_accessRights_s dc_rights_s new URI name
Alternative Title dct_alternative_sm   new field
Bounding Box dcat_bbox solr_geom new field
Centroid dcat_centroid   new field
Creator dct_creator_sm dc_creator_sm new namespace
Date Issued dct_issued_s dct_issued_s no change
Date Range gbl_dateRange_drsim   new field
Description dct_description_sm dc_description_s new namespace; single to multi-valued
File Size gbl_fileSize_s   new field
Format dct_format_s dc_format_s new namespace
Geometry locn_geometry solr_geom new field
Georeferenced gbl_georeferenced_b   new field
ID id layer_slug_s new URI name
Identifier dct_identifier_sm dc_identifier_s new namespace; single to multi-valued
Index Year gbl_indexYear_im solr_year_i new URI name; single to multi-valued
Is Part Of dct_isPartOf_sm   new value type (see Elements without a crosswalk)
Is Replaced By dct_isReplacedBy_sm   new field
Keyword dcat_keyword_sm   new field
Language dct_language_sm dc_language_s or _sm new namespace; single to multi-valued
License dct_license_sm   new field
Member Of pcdm_memberOf_sm   new field
Metadata Version gbl_mdVersion_s geoblacklight_version new URI name
Modified gbl_mdModified_dt layer_modified_dt new URI name
Provider schema_provider_s dct_provenance_s new URI name
Publisher dct_publisher_sm dc_publisher_s new namespace; single to multi-valued
References dct_references_s dct_references_s no change
Relation dct_relation_sm   new field
Replaces dct_replaces_sm   new field
Resource Class gbl_resourceClass_sm   new field
Resource Type gbl_resourceType_sm   new field
Rights dct_rights_sm   new field
Rights Holder dct_rightsHolder_sm   new field
Source dct_source_sm dc_source_sm new namespace
Spatial Coverage dct_spatial_sm dct_spatial_sm no change
Subject dct_subject_sm dc_subject_sm new namespace
Suppressed gbl_suppressed_b suppressed_b new namespace
Temporal Coverage dct_temporal_sm dct_temporal_sm no change
Theme dcat_theme_sm   new field
Title dct_title_s dc_title_s new namespace
Version dct_isVersionOf_sm   new field
WxS Identifier gbl_wxsIdentifier_s layer_id_s new URI name
    dc_type_s deprecated field
    layer_geom_type_s deprecated field
    dc_isPartOf_sm deprecated literal field, replaced by nonliteral dct_isPartOf_sm,, which takes an ID

Elements without a crosswalk

Most of the elements from GBL 1.0 can be crosswalked directly into OGM Aardvark. The values for these elements are the same - only the URI name has changed or the field has been converted to an array.

However, there are three elements in GBL 1.0 that do not directly translate into OGM Aardvark. While they have been replaced with similar fields in OGM Aardvark, the values themselves would need to be altered during crosswalking.

Type (dc_type_s)

  • GBL 1.0 Description: This single-valued GBL 1.0 field observes the Dublin Core controlled vocabulary for Type, including Dataset, Image, Collection, Interactive Resource, or Physical Object.
  • Similar Aardvark element: This has been replaced in Aardvark with the multi-valued Resource Class, which uses a custom controlled vocabulary of Collections, Datasets, Imagery, Maps, Web services, and/or Other.

Geometry Type (layer_geom_type_s)

  • GBL 1.0 Description: This single-valued GBL 1.0 field differentiates between vector (Point, Line, Polygon), raster (Raster, Image), non-spatial formats (Table), or a combination (Mixed).
  • Similar Aardvark element: This has been replaced in Aardvark with the multi-valued Resource Type, which uses a controlled vocabulary drawn from Library of Congress cartographic genres and GIS geometries.

Is Part Of (dct_isPartOf_sm)

  • GBL 1.0 Description: This multi-valued GBL 1.0 plain text field is for writing out the name of a collection. Example: dct_isPartOf_sm:"Village Maps of India"
  • Similar Aardvark element: The URI is the same in Aardvark, but it is now a non-literal field. The value must be one or more IDs that reference another record within the system. Example: dct_isPartOf_sm:"princeton-z603r079s"

Tools and techniques for upgrading

The following options are three ways to upgrade GBL 1.0 metadata into OGM Aardvark. The figures include references to Solr, the search index that powers a GeoBlacklight instance.

Option 1: Re-run the metadata pipeline with a new crosswalk

metadata-pipline
Fig.1 - Metadata pipeline showing a conversion from standards metadata

Scenario

  • you have geospatial resources with structured metadata files in an official standard, such as ISO 19139, FGDC, MODS, or MARC
  • your organization already has a metadata pipeline process that converts these structured files to GBL 1.0

How does it work?

This option involves updating your local transformation workflow that extracts values from standards-based metadata files.

  • For institutions that utilize an XSLT for the transformation, GeoCombine repository has XSL files.

  • For institutions that use custom tools or Python scripts, refer to the OGM Aardvark - FGDC- ISO 19139 crosswalk document to update the code. It shows crosswalks between OGM Aardvark, GBL 1.0, FGDC, and ISO.

Considerations for Option #1

  • may require extra institutional support, particularly if the transformation process is part of a larger framework or connected to a repository.
  • community-developed XSLs are still a work in progress
OGM Aardvark FGDC CSDGM ISO 19139
dct_accessRights_s idinfo/acconst MD_DataIdentification/resourceConstraints/MD_LegalConstraints/accessConstraints
dct_alternative_sm    
dcat_bbox idinfo/spdom/bounding/westbc AND eastbc AND northbc AND southbc MD_DataIdentification/extent/EX_Extent/geographicElement/EX_GeographicBoundingBox/west AND south AND east AND north
dcat_centroid    
dct_creator_sm idinfo/citation/citeinfo/originator MD_DataIdentification/citation/CI_Citation/citation/citedResponsibleParty/CI_ResponsibleParty/individualName OR organsationName (@type = originator)
dct_issued_s idinfo/citation/citeinfo/pubdate MD_DataIdentification/citation/CI_Citation/citation/CI_Date/date
gbl_dateRange_drsim idinfo/timeperd/timeinfo/rngdates/begdate and idinfo/timeperd/timeinfo/rngdates/enddate  
dct_description_sm descript/abstract MD_DataIdentification/abstract
gbl_fileSize_s distinfo/stdorder/digform/digtinfo/transize  
dct_format_s spdoinfo/direct MD_Distribution/distributionFormat/MD_Format/name
locn_geometry idinfo/spdom/bounding/westbc AND eastbc AND northbc AND southbc  
gbl_georeferenced_b    
id not applicable not applicable
dct_identifier_sm   MD_DataIdentification/citation/CI_Citation/citation/MD_Identifier/code
gbl_indexYear_im idinfo/timeprd/timeinfo/sngdate/caldate and idinfo/timeperd/timeinfo/mdattim/sngdate/caldate and idinfo/timeperd/timeinfo/rngdates/begdate and idinfo/keywords/temporal/tempkey MD_DataIdentification/extent/EX_Extent/EX_TemporalExtent/extent/TimePeriod OR TimeInstant [YYYY]
dct_isPartOf_sm    
  idinfo/citation/citeinfo/lworkcit/citeinfo/title MD_AggregateInformation/aggregateDataSetName/CI_Citation/title
dct_isReplacedBy_sm    
dcat_keyword_sm    
dct_language_sm   MD_DataIdentification/language
dct_license_sm    
pcdm_memberOf_sm    
gbl_mdVersion_s    
gbl_mdModified_dt metainfo/metd  
schema_provider_s distinfo/distrib/cntinfo/cntorgp/cntorg MD_Distributor/CI_ResponsibleParty/organisationName
dct_publisher_sm idinfo/citation/citeinfo/pubinfo/publisher MD_DataIdentification/citation/CI_Citation/citation/citedResponsibleParty/CI_ResponsibleParty/individualName OR organsationName (@type = publisher)
dct_references_s not applicable not applicable
dct_relation_sm    
dct_replaces_sm    
gbl_resourceClass_sm idinfo/citation/citeinfo/geoform  
gbl_resourceType_sm spdoinfo/ptvctinf/sdtsterm/sdtstype, idinfo/keywords/theme/themekey MD_GeometricObjectTypeCode
dct_rights_sm idinfo/useconst, idinfo/useconst, idinfo/acconst  
dct_rightsHolder_sm    
dct_source_sm dataqual/lineage/srcinfo/srccite/citeinfo/title DQ_DataQuality/lineage/LI_Lineage/source/LI_Source/sourceCitation/CI_Citation/title
dct_spatial_sm idinfo/keywords/place MD_DataIdentification/descriptiveKeywords/MD_Keywords/keyword (@type=’place’)
dct_subject_sm idinfo/keywords/theme/themekey MD_DataIdentification/descriptiveKeywords/MD_Keywords/keyword (@type=’theme’) AND MD_DataIdentification/topicCategory/MD_TopicCategoryCode
gbl_suppressed_b not applicable not applicable
dct_temporal_sm idinfo/timeprd/timeinfo/sngdate/caldate and idinfo/timeperd/timeinfo/mdattim/sngdate and idinfo/timeperd/timeinfo/rngdates and idinfo/keywords/temporal/tempkey MD_DataIdentification/extent/EX_Extent/EX_TemporalExtent/extent/TimePeriod OR TimeInstant
dcat_theme_sm    
dct_title_s idinfo/citation/citeinfo/title MD_DataIdentification/citation/CI_Citation/title
    hierarchyLevelName
dct_isVersionOf_sm    
gbl_wxsIdentifier_s not applicable not applicable

Option 2: Programmatically convert the JSON files

convert-jsons
Fig.2 - Programmatic transformation process using Geoblacklight 1.0 Metadata JSONs

Scenario

  • you only have GBL 1.0 metadata (no structured metadata files in an official standard)
  • you want to test your environment with the new Aardvark schema

How does it work?

  1. Gather GBL 1.0 metadata JSON files on your desktop
  2. Use a script or tool to convert the files batch convert GBL 1.0 JSON files to OGM Aardvark
  3. Re-index the resulting Aardvark JSON files into your application (GeoBlacklight)

Currently, the OpenGeoMetadata community has two tools that can do batch conversions:

  • gbl2aardvark: A web-hosted interface (recommended tool).

    • Users can upload GBL 1.0 metadata files to this tool and it will return a downloadable JSON in the OGM Aardvark schema.
    • In addition to direct crosswalks, this tool will also populate the Resource Class and Resource Type based upon the Type and Geometry Type fields from version 1.0. It will also generate new collection level records based upon the value in the Is Part Of fields.
    • Any fields that do not properly convert will be flagged with the phrase “EDIT ME –”
    • When reindexing Solr with a single JSON file representing multiple records, use Solr’s “Document Type”=”File Upload” option.
    • See the GitHub documentation for more information
  • a standalone Python script: https://github.com/OpenGeoMetadata/gbl-1_to_aardvark.

    • This command line script will perform a straight conversion of field names.
    • It features an editable crosswalk file to customize the transformation.
    • The non-crosswalkable elements listed above (Type, Geometry Type, and Is Part Of) do not have direct crosswalks and will be copied as is into the new Aardvark JSONs.

Considerations for Option #2

  • the fastest method, but is not a long-term solution and may result in incomplete metadata

  • will not include some fields that are new in OGM Aardvark, such as Rights or License. To take advantage of those fields, use Option 1 or 3.


Option 3: Conversion with manual remediation

manual-remediation
Fig.3 - Transformation process that includes manual remediation

Scenario

  • you have GBL 1.0 metadata files
  • you want to minimize data loss and corrupted output values
  • you have staff time available for manual work

How does it work?

This technique combines automatic conversions and manual edits:

  1. Convert your GBL 1.0 metadata files to a CSV. This Python script will convert a batch of JSONs to a CSV file

  2. Manually augment and adjust column names and values using spreadsheet functions.

  3. Convert your spreadsheet to OGM Aardvark JSONs. This Python script will convert CSVs to Aardvark JSONS

Considerations for Option #3

  • more likely to produce normalized, consistent values than Options 1 or 2.

  • may require extra dedicated staff time

  • could be adjusted to work with the extracted metadata from standards files