mex.extractors.datenkompass package¶
Subpackages¶
- mex.extractors.datenkompass.models package
- Submodules
- mex.extractors.datenkompass.models.item module
DatenkompassActivityDatenkompassActivity.beschreibungDatenkompassActivity.datenbankDatenkompassActivity.datenerhaltDatenkompassActivity.datenhalterDatenkompassActivity.datennutzungszweckDatenkompassActivity.dk_formatDatenkompassActivity.enddatumDatenkompassActivity.frequenzDatenkompassActivity.hauptkategorieDatenkompassActivity.herausgeberDatenkompassActivity.identifierDatenkompassActivity.kommentarDatenkompassActivity.kontaktDatenkompassActivity.model_configDatenkompassActivity.organisationseinheitDatenkompassActivity.rechtsgrundlageDatenkompassActivity.schlagwortDatenkompassActivity.startdatumDatenkompassActivity.statusDatenkompassActivity.titelDatenkompassActivity.unterkategorieDatenkompassActivity.voraussetzungen
DatenkompassBibliographicResourceDatenkompassBibliographicResource.beschreibungDatenkompassBibliographicResource.datenbankDatenkompassBibliographicResource.datenerhaltDatenkompassBibliographicResource.datenhalterDatenkompassBibliographicResource.datennutzungszweckDatenkompassBibliographicResource.datennutzungszweck_erweitertDatenkompassBibliographicResource.dk_formatDatenkompassBibliographicResource.frequenzDatenkompassBibliographicResource.hauptkategorieDatenkompassBibliographicResource.herausgeberDatenkompassBibliographicResource.identifierDatenkompassBibliographicResource.kommentarDatenkompassBibliographicResource.kontaktDatenkompassBibliographicResource.model_configDatenkompassBibliographicResource.organisationseinheitDatenkompassBibliographicResource.rechtsgrundlageDatenkompassBibliographicResource.rechtsgrundlagen_benennungDatenkompassBibliographicResource.schlagwortDatenkompassBibliographicResource.statusDatenkompassBibliographicResource.titelDatenkompassBibliographicResource.unterkategorieDatenkompassBibliographicResource.voraussetzungen
DatenkompassResourceDatenkompassResource.beschreibungDatenkompassResource.datenbankDatenkompassResource.datenerhaltDatenkompassResource.datenhalterDatenkompassResource.datennutzungszweckDatenkompassResource.datennutzungszweck_erweitertDatenkompassResource.dk_formatDatenkompassResource.frequenzDatenkompassResource.hauptkategorieDatenkompassResource.herausgeberDatenkompassResource.identifierDatenkompassResource.kommentarDatenkompassResource.kontaktDatenkompassResource.model_configDatenkompassResource.organisationseinheitDatenkompassResource.rechtsgrundlageDatenkompassResource.rechtsgrundlagen_benennungDatenkompassResource.schlagwortDatenkompassResource.startdatumDatenkompassResource.statusDatenkompassResource.titelDatenkompassResource.unterkategorieDatenkompassResource.voraussetzungen
- mex.extractors.datenkompass.models.mapping module
- Module contents
Submodules¶
mex.extractors.datenkompass.extract module¶
- mex.extractors.datenkompass.extract.get_extracted_item_stable_target_ids(entity_type: list[str], referenced_identifier: list[str] | None) list[MergedIdentifier]¶
Fetch extracted items from backend and return their stableTargetId.
- Parameters:
entity_type – List of entity types.
referenced_identifier – list of MergedIdentifiers to filter for
- Returns:
List of stableTargetIds of extracted items of the given entity type(s).
- mex.extractors.datenkompass.extract.get_filtered_primary_source_ids(filtered_primary_sources: list[str] | str | None) list[str]¶
Get a list of MergedIdentifier of filtered primary sources.
- Parameters:
filtered_primary_sources – List of primary sources.
- Returns:
List of IDs of the filtered relevant primary sources.
- mex.extractors.datenkompass.extract.get_merged_items(*, query_string: str | None = None, entity_type: list[str] | None = None, referenced_identifier: list[str] | None = None, reference_field: str | None = None) list[AnyMergedModel]¶
Fetch merged items from backend.
- Parameters:
query_string – Query string.
entity_type – List of entity types.
referenced_identifier – List of Identifier.
reference_field – List of fields accepting identifiers.
- Returns:
List of merged items.
mex.extractors.datenkompass.filter module¶
- mex.extractors.datenkompass.filter.filter_activities_by_organization(datenkompass_merged_activities_by_primary_source: list[MergedActivity]) list[MergedActivity]¶
Filter the merged activities based on the mapping specifications.
- Parameters:
datenkompass_merged_activities_by_primary_source – merged activities by unit.
- Returns:
filtered list of merged activities by unit.
- mex.extractors.datenkompass.filter.filter_merged_items_for_primary_source(merged_items_by_primary_source: dict[str, list[MergedResource]], entity_type: str) dict[str, list[MergedResource]]¶
- mex.extractors.datenkompass.filter.filter_merged_items_for_primary_source(merged_items_by_primary_source: dict[str, list[MergedActivity]], entity_type: str) dict[str, list[MergedActivity]]
Filter the merged items for primary source as defined in settings.
Special treatment for items which were created/edited in editor: filter those merged items out, which are referenced via stableTargetID by an extracted item, to keep only those merged items which consist only of rules
- Parameters:
merged_items_by_primary_source – merged items dictionary by primary source.
entity_type – entity type to of merged items
Settings: primary source which needs to be filtered
- Returns:
dictionary with list of filtered merged items
- mex.extractors.datenkompass.filter.filter_merged_resources_by_unit(merged_resources_by_primary_source: dict[str, list[MergedResource]], resource_filter_mapping: DatenkompassFilterMapping, merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit]) dict[str, dict[str, list[MergedResource]]]¶
Filter the merged resources by (unit and its childunits) in field unitInCharge.
- Parameters:
merged_resources_by_primary_source – merged resources by primary source.
resource_filter_mapping – Datenkompass resource filter mapping
merged_organizational_units_by_id – all merged units by their id
- Returns:
filtered list of merged resources by primary source by unit.
- mex.extractors.datenkompass.filter.find_descendant_units(merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], parent_unit_name: str) list[str]¶
Based on filter settings find descendant unit ids.
- Parameters:
merged_organizational_units_by_id – merged organizational units by identifier.
parent_unit_name – name of the parent unit for which to find all descendants
- Returns:
identifier of units which are descendants of the unit filter setting.
mex.extractors.datenkompass.main module¶
mex.extractors.datenkompass.settings module¶
- class mex.extractors.datenkompass.settings.DatenkompassSettings(*, schedule: str | None = None, organization_filter: str = 'Organization', cutoff_number_authors: int = 3, list_delimiter: str = '; ', min_keyword_item_length: int = 2, max_keyword_str_length: int = 50, mapping_path: AssetsPath = AssetsPath('mappings/mapping-to-external-schema/datenkompass'))¶
Bases:
BaseModelSettings submodel for the datenkompass extractor.
- cutoff_number_authors: int¶
- list_delimiter: str¶
- mapping_path: AssetsPath¶
- max_keyword_str_length: int¶
- min_keyword_item_length: int¶
- model_config = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': False, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- organization_filter: str¶
- schedule: str | None¶
mex.extractors.datenkompass.transform module¶
- mex.extractors.datenkompass.transform.filter_schlagworte(words: list[str | None], delim: str, min_word_length: int, max_string_length: int) str¶
Filter out certain words and limit final string to maximum length.
- Parameters:
words – list of entries.
delim – list delimiter for joining the strings in list
min_word_length – minimal length of each word.
max_string_length – maximal length of final string of joined words.
- Returns:
combined string.
- mex.extractors.datenkompass.transform.fix_quotes(string: str) str¶
Fix quote characters in titles or descriptions.
Removes surrounding (leading and trailing) double quotes and replaces in-string double quotes with single quotes.
- Parameters:
string – The string to fix quotes for.
- Returns:
The fixed string.
- mex.extractors.datenkompass.transform.get_abstract_or_description(abstracts: list[Text], delim: str) str¶
Get German list entries, join them and reformat html-formated links.
- Parameters:
abstracts – list of mixed language strings with possible html-formated links
delim – list delimiter for joining the strings in list
- Returns:
joined german strings with reformated plain text urls.
- mex.extractors.datenkompass.transform.get_datenbank(item: MergedBibliographicResource) str | None¶
Get first doi url or first repository URL.
- Parameters:
item – MergedBibliographicResource item.
- Returns:
url as string.
- mex.extractors.datenkompass.transform.get_email(responsible_unit_ids: list[MergedOrganizationalUnitIdentifier], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit]) str | None¶
Get the first email address of referenced responsible units.
- Parameters:
responsible_unit_ids – List of responsible unit identifiers
merged_organizational_units_by_id – dict of all merged organizational units by id
- Returns:
first found email of a responsible unit as string, or None if no email is found.
- mex.extractors.datenkompass.transform.get_german_text(text_entries: list[Text]) list[str]¶
Get german entries of list as strings, if any exist.
If no german entry exists, return original list entries as strings. Always fix quotes in entries.
- Parameters:
text_entries – list of text entries
- Returns:
list of entries as strings
- mex.extractors.datenkompass.transform.get_german_vocabulary(entries: list[VocabularyT] | None) list[str | None]¶
Get german prefLabel for Vocabularies.
- Parameters:
entries – list of vocabulary type entries.
- Returns:
list of german Vocabulary entries as strings.
- mex.extractors.datenkompass.transform.get_resource_email(responsible_reference_ids: list[MergedOrganizationalUnitIdentifier | MergedPersonIdentifier | MergedContactPointIdentifier], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], merged_contact_points_by_id: dict[MergedContactPointIdentifier, MergedContactPoint]) str | None¶
Get the first email address of referenced responsible units or contact points.
Ignore referenced Persons.
- Parameters:
responsible_reference_ids – List of referenced unit, contact point or person ids
merged_organizational_units_by_id – dict of all merged organizational units by id
merged_contact_points_by_id – Dict of all merged contact points by id
- Returns:
first found email of a unit or contact as string, or None if no email is found.
- mex.extractors.datenkompass.transform.get_title(item: MergedActivity) list[str]¶
Get shortName and title from merged activity item.
- Parameters:
item – MergedActivity item.
- Returns:
List of short name and title of units as strings.
- mex.extractors.datenkompass.transform.get_unit_shortname(responsible_unit_ids: list[MergedOrganizationalUnitIdentifier], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], delim: str) str | None¶
Get shortName of merged units.
- Parameters:
responsible_unit_ids – List of responsible unit identifiers
merged_organizational_units_by_id – dict of all merged organizational units by id
delim – delimiter for joining short name entries
- Returns:
List of short names of contact units as strings.
- mex.extractors.datenkompass.transform.handle_setval(set_value: list[str] | str | None) str¶
Return value of mapping setValues as string, even if setValues is a list.
- Parameters:
set_value – setValues value of mapping
- Returns:
stringified value of setValues.
- mex.extractors.datenkompass.transform.mapping_lookup_default(model: type[BaseModel], mapping: DatenkompassMapping) dict[str, DatenkompassMappingField]¶
Create a dictionary of fields by field name of Datenkompass mappings.
For this the alias name needs to be used as intermediate step, because the alias (not the field name) is the identifier in the mapping.
- Parameters:
model – Datenkompass model.
mapping – Datenkompass mapping.
- Returns:
dictionary of mapping field names to values.
- mex.extractors.datenkompass.transform.transform_activities(filtered_merged_activities: list[MergedActivity], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit]) list[DatenkompassActivity]¶
Transform merged to datenkompass activities.
- Parameters:
filtered_merged_activities – List of merged activities
merged_organizational_units_by_id – dict of merged organizational units by id
- Returns:
list of DatenkompassActivity instances.
- mex.extractors.datenkompass.transform.transform_bibliographic_resources(merged_bibliographic_resources: list[MergedBibliographicResource], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], datenkompass_person_str_by_id: dict[MergedPersonIdentifier, str]) list[DatenkompassBibliographicResource]¶
Transform merged to datenkompass bibliographic resources.
- Parameters:
merged_bibliographic_resources – List of merged bibliographic resources
merged_organizational_units_by_id – dict of merged organizational units by id
datenkompass_person_str_by_id – dictionary of merged person names by id
bibliographic_resource_mapping – Datenkompass mapping.
- Returns:
list of DatenkompassBibliographicResource instances.
- mex.extractors.datenkompass.transform.transform_resources(merged_resources_by_primary_source_by_unit: dict[str, dict[str, list[MergedResource]]], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], merged_contact_points_by_id: dict[MergedContactPointIdentifier, MergedContactPoint]) dict[str, dict[str, list[DatenkompassResource]]]¶
Transform merged to datenkompass resources.
- Parameters:
merged_resources_by_primary_source_by_unit – dictionary of merged resources
merged_organizational_units_by_id – dict of merged organizational units by id
merged_contact_points_by_id – dict of merged contact points
- Returns:
list of DatenkompassResource instances.