mex.extractors.open_data package¶
Subpackages¶
- mex.extractors.open_data.models package
- Submodules
- mex.extractors.open_data.models.source module
Links
OpenDataCreatorsOrContributors
OpenDataLicenseOrFile
OpenDataMetadata
OpenDataMetadata.contributors
OpenDataMetadata.creators
OpenDataMetadata.description
OpenDataMetadata.keywords
OpenDataMetadata.language
OpenDataMetadata.license
OpenDataMetadata.model_computed_fields
OpenDataMetadata.model_config
OpenDataMetadata.model_fields
OpenDataMetadata.publication_date
OpenDataMetadata.related_identifiers
OpenDataParentResource
OpenDataParentResource.conceptdoi
OpenDataParentResource.conceptrecid
OpenDataParentResource.files
OpenDataParentResource.id
OpenDataParentResource.metadata
OpenDataParentResource.model_computed_fields
OpenDataParentResource.model_config
OpenDataParentResource.model_fields
OpenDataParentResource.modified
OpenDataParentResource.title
OpenDataRelatedIdentifiers
OpenDataResourceVersion
OpenDataVersionFiles
- Module contents
Submodules¶
mex.extractors.open_data.connector module¶
- class mex.extractors.open_data.connector.OpenDataConnector¶
Bases:
HTTPConnector
Connector class to handle requesting the Zenodo API.
- _set_url() None ¶
Set url of the host.
- get_files_for_resource_version(version_id: int) list[OpenDataVersionFiles] ¶
Load files for each version of a resource by querying the Zenodo API.
- Parameters:
version_id – id of a resource version
- Returns:
Zenodo resource version files
- get_oldest_resource_version_creation_date(resource_id: int) str | None ¶
Load oldest (first) version of a resource by querying the Zenodo API.
- Parameters:
resource_id – id of any resource version
- Returns:
Zenodo resource version (oldest)
- get_parent_resources() list[OpenDataParentResource] ¶
Load parent resources by querying the Zenodo API.
Gets the parent resources (~ latest version) of all the resources of the configured Zenodo community.
- Returns:
list of parent resources
mex.extractors.open_data.extract module¶
- mex.extractors.open_data.extract.extract_files_for_parent_resource(version_id: int) list[OpenDataVersionFiles] ¶
Fetch all files of a version resource.
- Parameters:
version_id – id of record version as integer
- Returns:
OpenDataVersionFiles
- mex.extractors.open_data.extract.extract_oldest_record_version_creationdate(record_id: int) str | None ¶
Fetch only the oldest version of a parent resource.
- Parameters:
record_id – id of record version as integer
- Returns:
OpenDataResourceVersion
- mex.extractors.open_data.extract.extract_open_data_persons_from_open_data_parent_resources(open_data_parent_resource: list[OpenDataParentResource]) list[OpenDataCreatorsOrContributors] ¶
Extract unique open Data persons from open data parent resources.
- Parameters:
open_data_parent_resource – open data parent resource
- Resturns:
list of extracted open data persons (creators or contributors)
- mex.extractors.open_data.extract.extract_parent_resources() list[OpenDataParentResource] ¶
Load Open Data resources by querying the Zenodo API.
Get all resources of the configured Zenodo community. These are called ‘parent resources’.
- Returns:
list of parent resources
mex.extractors.open_data.main module¶
mex.extractors.open_data.settings module¶
- class mex.extractors.open_data.settings.OpenDataSettings(*, url: str = 'https://zenodo', community_rki: str = 'robertkochinstitut', mapping_path: AssetsPath = AssetsPath('mappings/open-data'))¶
Bases:
BaseModel
Zenodo settings submodel definition for the Open Data extractor.
- community_rki: str¶
- mapping_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'community_rki': FieldInfo(annotation=str, required=False, default='robertkochinstitut', description='Zenodo communitiy of rki'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/open-data"), description='Path to the directory with the open data mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'url': FieldInfo(annotation=str, required=False, default='https://zenodo', description='Zenodo instance URL')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- url: str¶
mex.extractors.open_data.transform module¶
- mex.extractors.open_data.transform.lookup_person_in_ldap_and_transfom(person: OpenDataCreatorsOrContributors, extracted_primary_source_ldap: ExtractedPrimarySource, units_by_identifier_in_primary_source: dict[str, ExtractedOrganizationalUnit]) ExtractedPerson | None ¶
Lookup person in ldap. and transform to ExtractedPerson.
- Parameters:
person – Open Data person (Creator Or Contributor),
extracted_primary_source_ldap – primary Source for ldap
units_by_identifier_in_primary_source – dict of primary sources by ID
- Returns:
ExtractedPerson if matched or None if match fails
- mex.extractors.open_data.transform.transform_open_data_distributions(open_data_parent_resources: list[OpenDataParentResource], extracted_primary_source_open_data: ExtractedPrimarySource, distribution_mapping: DistributionMapping) list[ExtractedDistribution] ¶
Transform open data resource versions to extracted distributions.
- Parameters:
open_data_parent_resources – list of open data parent resources
extracted_primary_source_open_data – Extracted platform for open data
distribution_mapping – resource mapping model with default values
- Returns:
List of ExtractedDistribution instances
- mex.extractors.open_data.transform.transform_open_data_parent_resource_to_mex_resource(open_data_parent_resource: list[OpenDataParentResource], extracted_primary_source_open_data: ExtractedPrimarySource, extracted_open_data_persons: list[ExtractedPerson], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_open_data_distribution: list[ExtractedDistribution], resource_mapping: ResourceMapping, extracted_organization_rki: ExtractedOrganization, open_data_contact_point: list[ExtractedContactPoint]) list[ExtractedResource] ¶
Transform open_data parent resources to extracted resources.
- Parameters:
open_data_parent_resource – open data parent resources
extracted_primary_source_open_data – Extracted platform for open data
extracted_open_data_persons – list of ExtractedPerson
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
extracted_open_data_distribution – list of Extracted open data Distributions
resource_mapping – resource mapping model with default values
extracted_organization_rki – ExtractedOrganization
open_data_contact_point – list[ExtractedContactPoint]
- Returns:
list of ExtractedResource instances
- mex.extractors.open_data.transform.transform_open_data_person_affiliations_to_organisations(extracted_open_data_creators_contributors: list[OpenDataCreatorsOrContributors], extracted_primary_source_open_data: ExtractedPrimarySource) dict[str, MergedOrganizationIdentifier] ¶
Search wikidata or create own organisations, load to sink and create dictionary.
- Parameters:
extracted_open_data_creators_contributors – list of creators and contributors
extracted_primary_source_open_data – extracted Primary source for Open Data
- Returns:
list of Extracted Organization Ids by affiliation name
- mex.extractors.open_data.transform.transform_open_data_persons(extracted_open_data_creators_contributors: list[OpenDataCreatorsOrContributors], extracted_primary_source_ldap: ExtractedPrimarySource, extracted_primary_source_open_data: ExtractedPrimarySource, extracted_organizational_units: list[ExtractedOrganizationalUnit], extracted_organization_rki: ExtractedOrganization, extracted_open_data_organizations: dict[str, MergedOrganizationIdentifier]) list[ExtractedPerson] ¶
Lookup persons in ldap or create ExtractedPerson if match fails.
- Parameters:
extracted_open_data_creators_contributors – list of Creators Or Contributors
extracted_primary_source_ldap – Extracted Primary Sources for ldap
extracted_primary_source_open_data – Extracted Primary Sources for open-data
extracted_organizational_units – list of Extracted Organizational Units
extracted_organization_rki – ExtractedOrganization of RKI,
extracted_open_data_organizations – dictionary with ID by affiliation name
- Returns:
list of Extracted Persons
- mex.extractors.open_data.transform.transform_open_data_persons_not_in_ldap(person: OpenDataCreatorsOrContributors, extracted_primary_source_open_data: ExtractedPrimarySource, extracted_organization_rki: ExtractedOrganization, extracted_open_data_organizations: dict[str, MergedOrganizationIdentifier]) ExtractedPerson ¶
Create ExtractedPerson for a person not matched with ldap.
- Parameters:
person – list[OpenDataCreatorsOrContributors],
extracted_primary_source_open_data – open data primary source,
extracted_organization_rki – ExtractedOrganization of RKI,
extracted_open_data_organizations – dictionary with ID by affiliation name
- Returns:
ExtractedPerson