mex.extractors.open_data package¶
Subpackages¶
- mex.extractors.open_data.models package
- Submodules
- mex.extractors.open_data.models.source module
- Module contents
Submodules¶
mex.extractors.open_data.connector module¶
- class mex.extractors.open_data.connector.OpenDataConnector¶
Bases:
HTTPConnectorConnector class to handle requesting the Zenodo API.
- _send_request(method: str, url: str, params: Mapping[str, list[str] | str | None] | None, **kwargs: Any) Response¶
Overwrite HTTPConnector._send_request with more waiting time.
- _set_url() None¶
Set url of the host.
- get_files_for_resource_version(version_id: int) list[OpenDataVersionFiles]¶
Load files for each version of a resource by querying the Zenodo API.
- Parameters:
version_id – id of a resource version
- Returns:
Zenodo resource version files
- get_oldest_resource_version_creation_date(resource_id: int) str | None¶
Load oldest (first) version of a resource by querying the Zenodo API.
- Parameters:
resource_id – id of any resource version
- Returns:
Zenodo resource version (oldest)
- get_parent_resources() list[OpenDataParentResource]¶
Load parent resources by querying the Zenodo API.
Gets the parent resources (~ latest version) of all the resources of the configured Zenodo community.
- Returns:
list of parent resources
- get_schema_zipfile(version_id: int) Response¶
Get the Zip file for a certain resource version.
The resource versions where checked to have a valid metadata zip file. The zip file can be named “Metadata” or “Metadaten”.
- Parameters:
version_id – id of a resource version
- Returns:
Response of query
mex.extractors.open_data.extract module¶
- mex.extractors.open_data.extract.extract_files_for_parent_resource(version_id: int) list[OpenDataVersionFiles]¶
Fetch all files of a version resource.
- Parameters:
version_id – id of record version as integer
- Returns:
OpenDataVersionFiles
- mex.extractors.open_data.extract.extract_oldest_record_version_creationdate(record_id: int) str | None¶
Fetch only the oldest version of a parent resource.
- Parameters:
record_id – id of record version as integer
- Returns:
OpenDataResourceVersion
- mex.extractors.open_data.extract.extract_open_data_persons_from_open_data_parent_resources(open_data_parent_resource: list[OpenDataParentResource]) list[OpenDataCreatorsOrContributors]¶
Extract unique open Data persons from open data parent resources.
- Parameters:
open_data_parent_resource – open data parent resource
- Returns:
list of extracted open data persons (creators or contributors)
- mex.extractors.open_data.extract.extract_parent_resources() list[OpenDataParentResource]¶
Load Open Data resources by querying the Zenodo API.
Get all resources of the configured Zenodo community. These are called ‘parent resources’.
- Returns:
list of parent resources
- mex.extractors.open_data.extract.extract_tableschema(version_id: int) dict[str, list[OpenDataTableSchema]]¶
Extract the metadata zip tableschemas.
- Parameters:
version_id – id of record version as integer
- Returns:
tableschema by name of tableschema json
mex.extractors.open_data.main module¶
mex.extractors.open_data.settings module¶
- class mex.extractors.open_data.settings.OpenDataSettings(*, url: str = 'https://zenodo', community_rki: str = 'robertkochinstitut', mapping_path: AssetsPath = AssetsPath('mappings/open-data'), zip_path_de: str = '/files/Metadaten.zip/content', zip_path_en: str = '/files/Metadata.zip/content')¶
Bases:
BaseModelZenodo settings submodel definition for the Open Data extractor.
- community_rki: str¶
- mapping_path: AssetsPath¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- url: str¶
- zip_path_de: str¶
- zip_path_en: str¶
mex.extractors.open_data.transform module¶
- mex.extractors.open_data.transform.get_only_child_units(selected_merged_organizational_unit_ids: list[MergedOrganizationalUnitIdentifier], extracted_organizational_units: list[ExtractedOrganizationalUnit]) list[MergedOrganizationalUnitIdentifier]¶
Return only those units which are no parents to other units within a list.
- Parameters:
selected_merged_organizational_unit_ids – list of unit ids to filter
extracted_organizational_units – list of all units to know who’s a parent
- Returns:
list of merged unit ids who are no parents to other units of the list
- mex.extractors.open_data.transform.lookup_person_in_ldap_and_transform(person: OpenDataCreatorsOrContributors, units_by_identifier_in_primary_source: dict[str, ExtractedOrganizationalUnit], extracted_organization_rki: ExtractedOrganization) ExtractedPerson | None¶
Lookup person in ldap. and transform to ExtractedPerson.
- Parameters:
person – Open Data person (Creator Or Contributor),
units_by_identifier_in_primary_source – dict of primary sources by ID
extracted_organization_rki – ExtractedOrganization of RKI,
- Returns:
ExtractedPerson if matched or None if match fails
- mex.extractors.open_data.transform.transform_open_data_distributions(open_data_parent_resources: list[OpenDataParentResource], distribution_mapping: DistributionMapping) list[ExtractedDistribution]¶
Transform open data resource versions to extracted distributions.
- Parameters:
open_data_parent_resources – list of open data parent resources
distribution_mapping – resource mapping model with default values
- Returns:
List of ExtractedDistribution instances
- mex.extractors.open_data.transform.transform_open_data_parent_resource_to_mex_resource(open_data_parent_resource: list[OpenDataParentResource], open_data_persons: list[ExtractedPerson], unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]], extracted_organizational_units: list[ExtractedOrganizationalUnit], open_data_distribution: list[ExtractedDistribution], resource_mapping: ResourceMapping, extracted_organization_rki: ExtractedOrganization, open_data_extracted_contact_points: list[ExtractedContactPoint]) list[ExtractedResource]¶
Transform open_data parent resources to extracted resources.
- Parameters:
open_data_parent_resource – open data parent resources
open_data_persons – list of ExtractedPerson
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
extracted_organizational_units – list of Extracted Organizational Units
open_data_distribution – list of Extracted open data Distributions
resource_mapping – resource mapping model with default values
extracted_organization_rki – ExtractedOrganization
open_data_extracted_contact_points – list of ExtractedContactPoints
- Returns:
list of ExtractedResource instances
- mex.extractors.open_data.transform.transform_open_data_person_affiliations_to_organizations(open_data_creators_contributors: list[OpenDataCreatorsOrContributors]) dict[str, MergedOrganizationIdentifier]¶
Search wikidata or create own organizations, load to sink and create dictionary.
- Parameters:
open_data_creators_contributors – list of creators and contributors
- Returns:
list of Extracted Organization Ids by affiliation name
- mex.extractors.open_data.transform.transform_open_data_persons(open_data_creators_contributors: list[OpenDataCreatorsOrContributors], extracted_organizational_units: list[ExtractedOrganizationalUnit], extracted_organization_rki: ExtractedOrganization, open_data_organization_ids_by_str: dict[str, MergedOrganizationIdentifier]) list[ExtractedPerson]¶
Lookup persons in ldap or create ExtractedPerson if match fails.
- Parameters:
open_data_creators_contributors – list of Creators Or Contributors
extracted_organizational_units – list of Extracted Organizational Units
extracted_organization_rki – ExtractedOrganization of RKI,
open_data_organization_ids_by_str – dictionary with ID by affiliation name
- Returns:
list of Extracted Persons
- mex.extractors.open_data.transform.transform_open_data_persons_not_in_ldap(person: OpenDataCreatorsOrContributors, extracted_organization_rki: ExtractedOrganization, open_data_organization_ids_by_str: dict[str, MergedOrganizationIdentifier]) ExtractedPerson¶
Create ExtractedPerson for a person not matched with ldap.
- Parameters:
person – list[OpenDataCreatorsOrContributors],
extracted_organization_rki – ExtractedOrganization of RKI,
open_data_organization_ids_by_str – dictionary with ID by affiliation name
- Returns:
ExtractedPerson
- mex.extractors.open_data.transform.transform_open_data_variable_groups(open_data_tableschemas_by_resource_id: dict[MergedResourceIdentifier, dict[str, list[OpenDataTableSchema]]]) list[ExtractedVariableGroup]¶
Transform zip table schema names to variable groups.
- Parameters:
open_data_tableschemas_by_resource_id – list of table schemas by name by resource
- Returns:
extracted variable groups
- mex.extractors.open_data.transform.transform_open_data_variables(open_data_tableschemas_by_resource_id: dict[MergedResourceIdentifier, dict[str, list[OpenDataTableSchema]]], merged_variable_group_id_by_filename: dict[str, MergedVariableGroupIdentifier]) list[ExtractedVariable]¶
Transform table schema content to variables.
- Parameters:
open_data_tableschemas_by_resource_id – list of table schemas by name by resource
merged_variable_group_id_by_filename – variable group stableTargetId by filename
- Returns:
extracted variables