mex.extractors.open_data package

Subpackages

Submodules

mex.extractors.open_data.connector module

class mex.extractors.open_data.connector.OpenDataConnector

Bases: HTTPConnector

Connector class to handle requesting the Zenodo API.

_set_url() None

Set url of the host.

get_files_for_resource_version(version_id: int) list[OpenDataVersionFiles]

Load files for each version of a resource by querying the Zenodo API.

Parameters:

version_id – id of a resource version

Returns:

Zenodo resource version files

get_oldest_resource_version_creation_date(resource_id: int) str | None

Load oldest (first) version of a resource by querying the Zenodo API.

Parameters:

resource_id – id of any resource version

Returns:

Zenodo resource version (oldest)

get_parent_resources() list[OpenDataParentResource]

Load parent resources by querying the Zenodo API.

Gets the parent resources (~ latest version) of all the resources of the configured Zenodo community.

Returns:

list of parent resources

mex.extractors.open_data.extract module

mex.extractors.open_data.extract.extract_files_for_parent_resource(version_id: int) list[OpenDataVersionFiles]

Fetch all files of a version resource.

Parameters:

version_id – id of record version as integer

Returns:

OpenDataVersionFiles

mex.extractors.open_data.extract.extract_oldest_record_version_creationdate(record_id: int) str | None

Fetch only the oldest version of a parent resource.

Parameters:

record_id – id of record version as integer

Returns:

OpenDataResourceVersion

mex.extractors.open_data.extract.extract_open_data_persons_from_open_data_parent_resources(open_data_parent_resource: list[OpenDataParentResource]) list[OpenDataCreatorsOrContributors]

Extract unique open Data persons from open data parent resources.

Parameters:

open_data_parent_resource – open data parent resource

Resturns:

list of extracted open data persons (creators or contributors)

mex.extractors.open_data.extract.extract_parent_resources() list[OpenDataParentResource]

Load Open Data resources by querying the Zenodo API.

Get all resources of the configured Zenodo community. These are called ‘parent resources’.

Returns:

list of parent resources

mex.extractors.open_data.main module

mex.extractors.open_data.settings module

class mex.extractors.open_data.settings.OpenDataSettings(*, url: str = 'https://zenodo', community_rki: str = 'robertkochinstitut', mapping_path: AssetsPath = AssetsPath('mappings/open-data'))

Bases: BaseModel

Zenodo settings submodel definition for the Open Data extractor.

community_rki: str
mapping_path: AssetsPath
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'community_rki': FieldInfo(annotation=str, required=False, default='robertkochinstitut', description='Zenodo communitiy of rki'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/open-data"), description='Path to the directory with the open data mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'url': FieldInfo(annotation=str, required=False, default='https://zenodo', description='Zenodo instance URL')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

url: str

mex.extractors.open_data.transform module

mex.extractors.open_data.transform.lookup_person_in_ldap_and_transfom(person: OpenDataCreatorsOrContributors, extracted_primary_source_ldap: ExtractedPrimarySource, units_by_identifier_in_primary_source: dict[str, ExtractedOrganizationalUnit]) ExtractedPerson | None

Lookup person in ldap. and transform to ExtractedPerson.

Parameters:
  • person – Open Data person (Creator Or Contributor),

  • extracted_primary_source_ldap – primary Source for ldap

  • units_by_identifier_in_primary_source – dict of primary sources by ID

Returns:

ExtractedPerson if matched or None if match fails

mex.extractors.open_data.transform.transform_open_data_distributions(open_data_parent_resources: list[OpenDataParentResource], extracted_primary_source_open_data: ExtractedPrimarySource, distribution_mapping: DistributionMapping) list[ExtractedDistribution]

Transform open data resource versions to extracted distributions.

Parameters:
  • open_data_parent_resources – list of open data parent resources

  • extracted_primary_source_open_data – Extracted platform for open data

  • distribution_mapping – resource mapping model with default values

Returns:

List of ExtractedDistribution instances

mex.extractors.open_data.transform.transform_open_data_parent_resource_to_mex_resource(open_data_parent_resource: list[OpenDataParentResource], extracted_primary_source_open_data: ExtractedPrimarySource, extracted_open_data_persons: list[ExtractedPerson], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_open_data_distribution: list[ExtractedDistribution], resource_mapping: ResourceMapping, extracted_organization_rki: ExtractedOrganization, open_data_contact_point: list[ExtractedContactPoint]) list[ExtractedResource]

Transform open_data parent resources to extracted resources.

Parameters:
  • open_data_parent_resource – open data parent resources

  • extracted_primary_source_open_data – Extracted platform for open data

  • extracted_open_data_persons – list of ExtractedPerson

  • unit_stable_target_ids_by_synonym – Unit stable target ids by synonym

  • extracted_open_data_distribution – list of Extracted open data Distributions

  • resource_mapping – resource mapping model with default values

  • extracted_organization_rki – ExtractedOrganization

  • open_data_contact_point – list[ExtractedContactPoint]

Returns:

list of ExtractedResource instances

mex.extractors.open_data.transform.transform_open_data_person_affiliations_to_organisations(extracted_open_data_creators_contributors: list[OpenDataCreatorsOrContributors], extracted_primary_source_open_data: ExtractedPrimarySource) dict[str, MergedOrganizationIdentifier]

Search wikidata or create own organisations, load to sink and create dictionary.

Parameters:
  • extracted_open_data_creators_contributors – list of creators and contributors

  • extracted_primary_source_open_data – extracted Primary source for Open Data

Returns:

list of Extracted Organization Ids by affiliation name

mex.extractors.open_data.transform.transform_open_data_persons(extracted_open_data_creators_contributors: list[OpenDataCreatorsOrContributors], extracted_primary_source_ldap: ExtractedPrimarySource, extracted_primary_source_open_data: ExtractedPrimarySource, extracted_organizational_units: list[ExtractedOrganizationalUnit], extracted_organization_rki: ExtractedOrganization, extracted_open_data_organizations: dict[str, MergedOrganizationIdentifier]) list[ExtractedPerson]

Lookup persons in ldap or create ExtractedPerson if match fails.

Parameters:
  • extracted_open_data_creators_contributors – list of Creators Or Contributors

  • extracted_primary_source_ldap – Extracted Primary Sources for ldap

  • extracted_primary_source_open_data – Extracted Primary Sources for open-data

  • extracted_organizational_units – list of Extracted Organizational Units

  • extracted_organization_rki – ExtractedOrganization of RKI,

  • extracted_open_data_organizations – dictionary with ID by affiliation name

Returns:

list of Extracted Persons

mex.extractors.open_data.transform.transform_open_data_persons_not_in_ldap(person: OpenDataCreatorsOrContributors, extracted_primary_source_open_data: ExtractedPrimarySource, extracted_organization_rki: ExtractedOrganization, extracted_open_data_organizations: dict[str, MergedOrganizationIdentifier]) ExtractedPerson

Create ExtractedPerson for a person not matched with ldap.

Parameters:
  • person – list[OpenDataCreatorsOrContributors],

  • extracted_primary_source_open_data – open data primary source,

  • extracted_organization_rki – ExtractedOrganization of RKI,

  • extracted_open_data_organizations – dictionary with ID by affiliation name

Returns:

ExtractedPerson

Module contents