mex.extractors.datenkompass package¶
Subpackages¶
- mex.extractors.datenkompass.models package
- Submodules
- mex.extractors.datenkompass.models.item module
DatenkompassActivityDatenkompassActivity.beschreibungDatenkompassActivity.datenbankDatenkompassActivity.datenerhaltDatenkompassActivity.datenhalterDatenkompassActivity.datennutzungszweckDatenkompassActivity.dk_formatDatenkompassActivity.entityTypeDatenkompassActivity.frequenzDatenkompassActivity.hauptkategorieDatenkompassActivity.herausgeberDatenkompassActivity.identifierDatenkompassActivity.kommentarDatenkompassActivity.kontaktDatenkompassActivity.model_computed_fieldsDatenkompassActivity.model_configDatenkompassActivity.model_fieldsDatenkompassActivity.organisationseinheitDatenkompassActivity.rechtsgrundlageDatenkompassActivity.schlagwortDatenkompassActivity.statusDatenkompassActivity.titelDatenkompassActivity.unterkategorieDatenkompassActivity.voraussetzungen
DatenkompassBibliographicResourceDatenkompassBibliographicResource.beschreibungDatenkompassBibliographicResource.datenbankDatenkompassBibliographicResource.datenerhaltDatenkompassBibliographicResource.datenhalterDatenkompassBibliographicResource.datennutzungszweckDatenkompassBibliographicResource.datennutzungszweck_erweitertDatenkompassBibliographicResource.dk_formatDatenkompassBibliographicResource.entityTypeDatenkompassBibliographicResource.frequenzDatenkompassBibliographicResource.hauptkategorieDatenkompassBibliographicResource.herausgeberDatenkompassBibliographicResource.identifierDatenkompassBibliographicResource.kommentarDatenkompassBibliographicResource.kontaktDatenkompassBibliographicResource.model_computed_fieldsDatenkompassBibliographicResource.model_configDatenkompassBibliographicResource.model_fieldsDatenkompassBibliographicResource.organisationseinheitDatenkompassBibliographicResource.rechtsgrundlageDatenkompassBibliographicResource.rechtsgrundlagen_benennungDatenkompassBibliographicResource.schlagwortDatenkompassBibliographicResource.statusDatenkompassBibliographicResource.titelDatenkompassBibliographicResource.unterkategorieDatenkompassBibliographicResource.voraussetzungen
DatenkompassResourceDatenkompassResource.beschreibungDatenkompassResource.datenbankDatenkompassResource.datenerhaltDatenkompassResource.datenhalterDatenkompassResource.datennutzungszweckDatenkompassResource.datennutzungszweck_erweitertDatenkompassResource.dk_formatDatenkompassResource.entityTypeDatenkompassResource.frequenzDatenkompassResource.hauptkategorieDatenkompassResource.herausgeberDatenkompassResource.identifierDatenkompassResource.kommentarDatenkompassResource.kontaktDatenkompassResource.model_computed_fieldsDatenkompassResource.model_configDatenkompassResource.model_fieldsDatenkompassResource.organisationseinheitDatenkompassResource.rechtsgrundlageDatenkompassResource.rechtsgrundlagen_benennungDatenkompassResource.schlagwortDatenkompassResource.statusDatenkompassResource.titelDatenkompassResource.unterkategorieDatenkompassResource.voraussetzungen
- mex.extractors.datenkompass.models.mapping module
- Module contents
Submodules¶
mex.extractors.datenkompass.extract module¶
- mex.extractors.datenkompass.extract.get_filtered_primary_source_ids(filtered_primary_sources: list[str] | None) list[str]¶
Get the IDs of the relevant primary sources.
- Parameters:
filtered_primary_sources – List of primary sources.
- Returns:
List of IDs of the filtered relevant primary sources.
- mex.extractors.datenkompass.extract.get_merged_items(*, query_string: str | None = None, entity_type: list[str] | None = None, referenced_identifier: list[str] | None = None, reference_field: str | None = None) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]¶
Fetch merged items from backend.
- Parameters:
query_string – Query string.
entity_type – List of entity types.
referenced_identifier – List of Identifier.
reference_field – List of fields accepting identifiers.
- Returns:
List of merged items.
mex.extractors.datenkompass.filter module¶
- mex.extractors.datenkompass.filter.filter_for_organization(fetched_merged_activities: Sequence[MergedActivity], filtered_merged_organization_ids: set[MergedOrganizationIdentifier]) list[MergedActivity]¶
Filter the merged activities based on the mapping specifications.
- Parameters:
fetched_merged_activities – merged activities as sequence.
filtered_merged_organization_ids – relevant merged organization ids.
- Returns:
filtered list of merged activities.
- mex.extractors.datenkompass.filter.find_descendant_units(merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit]) list[str]¶
Based on filter settings find descendant unit ids.
- Parameters:
merged_organizational_units_by_id – merged organizational units by identifier.
- Returns:
identifier of units which are descendants of the unit filter setting.
mex.extractors.datenkompass.main module¶
mex.extractors.datenkompass.settings module¶
- class mex.extractors.datenkompass.settings.DatenkompassSettings(*, unit_filter: str = 'e.g. unit', organization_filter: str = 'Organization', cutoff_number_authors: int = 3, list_delimiter: str = '; ', mapping_path: AssetsPath = AssetsPath('mappings/mapping-to-external-schema/datenkompass'))¶
Bases:
BaseModelSettings submodel for the datenkompass extractor.
- cutoff_number_authors: int¶
- list_delimiter: str¶
- mapping_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': False, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'cutoff_number_authors': FieldInfo(annotation=int, required=False, default=3, description='Maximum number of extracted authors for Bibliographic resources'), 'list_delimiter': FieldInfo(annotation=str, required=False, default='; ', description='Seperator for different entries in a datenkompass model field.'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/mapping-to-external-schema/datenkompass"), description='Path to the directory with the datenkompass mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'organization_filter': FieldInfo(annotation=str, required=False, default='Organization', description='Filter for organization'), 'unit_filter': FieldInfo(annotation=str, required=False, default='e.g. unit', description='Filter for unit')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- organization_filter: str¶
- unit_filter: str¶
mex.extractors.datenkompass.transform module¶
- mex.extractors.datenkompass.transform.fix_quotes(string: str) str¶
Fix quote characters in titles or descriptions.
Removes surrounding (leading and trailing) double quotes and replaces in-string double quotes with single quotes.
- Parameters:
string – The string to fix quotes for.
- Returns:
The fixed string.
- mex.extractors.datenkompass.transform.get_abstract_or_description(abstracts: list[Text], delim: str) str¶
Get German list entries, join them and reformat html-formated links.
- Parameters:
abstracts – list of mixed language strings with possible html-formated links
delim – list delimiter for joining the strings in list
- Returns:
joined german strings with reformated plain text urls.
- mex.extractors.datenkompass.transform.get_datenbank(item: MergedBibliographicResource) str | None¶
Get first doi url or first repository URL.
- Parameters:
item – MergedBibliographicResource item.
- Returns:
url as string.
- mex.extractors.datenkompass.transform.get_email(responsible_unit_ids: list[MergedOrganizationalUnitIdentifier], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit]) str | None¶
Get the first email address of referenced responsible units.
- Parameters:
responsible_unit_ids – List of responsible unit identifiers
merged_organizational_units_by_id – dict of all merged organizational units by id
- Returns:
first found email of a responsible unit as string, or None if no email is found.
- mex.extractors.datenkompass.transform.get_german_text(text_entries: list[Text]) list[str]¶
Get german entries of list as strings, if any exist.
If no german entry exists, return original list entries as strings. Always fix quotes in entries.
- Parameters:
text_entries – list of text entries
- Returns:
list of entries as strings
- mex.extractors.datenkompass.transform.get_german_vocabulary(entries: list[_VocabularyT] | None) list[str | None]¶
Get german prefLabel for Vocabularies.
- Parameters:
entries – list of vocabulary type entries.
- Returns:
list of german Vocabulary entries as strings.
- mex.extractors.datenkompass.transform.get_resource_email(responsible_reference_ids: list[MergedOrganizationalUnitIdentifier | MergedPersonIdentifier | MergedContactPointIdentifier], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], merged_contact_points_by_id: dict[MergedContactPointIdentifier, MergedContactPoint]) str | None¶
Get the first email address of referenced responsible units or contact points.
Ignore referenced Persons.
- Parameters:
responsible_reference_ids – List of referenced unit, contact point or person ids
merged_organizational_units_by_id – dict of all merged organizational units by id
merged_contact_points_by_id – Dict of all merged contact points by id
- Returns:
first found email of a unit or contact as string, or None if no email is found.
- mex.extractors.datenkompass.transform.get_title(item: MergedActivity) list[str]¶
Get shortName and title from merged activity item.
- Parameters:
item – MergedActivity item.
- Returns:
List of short name and title of units as strings.
- mex.extractors.datenkompass.transform.get_unit_shortname(responsible_unit_ids: list[MergedOrganizationalUnitIdentifier], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], delim: str) str | None¶
Get shortName of merged units.
- Parameters:
responsible_unit_ids – List of responsible unit identifiers
merged_organizational_units_by_id – dict of all merged organizational units by id
delim – delimiter for joining short name entries
- Returns:
List of short names of contact units as strings.
- mex.extractors.datenkompass.transform.handle_setval(set_value: list[str] | str | None) str¶
Return value of mapping setValues as string, even if setValues is a list.
- Parameters:
set_value – setValues value of mapping
- Returns:
stringified value of setValues.
- mex.extractors.datenkompass.transform.mapping_lookup_default(model: type[BaseModel], mapping: DatenkompassMapping) dict[str, DatenkompassMappingField]¶
Create a dictionary of fields by field name of Datenkompass mappings.
For this the alias name needs to be used as intermediate step, because the alias (not the field name) is the identifier in the mapping.
- Parameters:
model – Datenkompass model.
mapping – Datenkompass mapping.
- Returns:
dictionary of mapping field names to values.
- mex.extractors.datenkompass.transform.transform_activities(filtered_merged_activities: list[MergedActivity], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], activity_mapping: DatenkompassMapping) list[DatenkompassActivity]¶
Transform merged to datenkompass activities.
- Parameters:
filtered_merged_activities – List of merged activities
merged_organizational_units_by_id – dict of merged organizational units by id
activity_mapping – Datenkompass mapping.
- Returns:
list of DatenkompassActivity instances.
- mex.extractors.datenkompass.transform.transform_bibliographic_resources(merged_bibliographic_resources: list[MergedBibliographicResource], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], person_name_by_id: dict[MergedPersonIdentifier, str], bibliographic_resource_mapping: DatenkompassMapping) list[DatenkompassBibliographicResource]¶
Transform merged to datenkompass bibliographic resources.
- Parameters:
merged_bibliographic_resources – List of merged bibliographic resources
merged_organizational_units_by_id – dict of merged organizational units by id
person_name_by_id – dictionary of merged person names by id
bibliographic_resource_mapping – Datenkompass mapping.
- Returns:
list of DatenkompassBibliographicResource instances.
- mex.extractors.datenkompass.transform.transform_resources(merged_resources_by_primary_source: dict[str, list[MergedResource]], merged_organizational_units_by_id: dict[MergedOrganizationalUnitIdentifier, MergedOrganizationalUnit], merged_contact_points_by_id: dict[MergedContactPointIdentifier, MergedContactPoint], resource_mapping: DatenkompassMapping) list[DatenkompassResource]¶
Transform merged to datenkompass resources.
- Parameters:
merged_resources_by_primary_source – dictionary of merged resources
merged_organizational_units_by_id – dict of merged organizational units by id
merged_contact_points_by_id – dict of merged contact points
resource_mapping – Datenkompass mapping.
- Returns:
list of DatenkompassResource instances.