mex.extractors.publisher package¶
Submodules¶
mex.extractors.publisher.extract module¶
- mex.extractors.publisher.extract.get_publishable_merged_items(*, entity_type: list[str] | None = None, referenced_identifier: list[str] | None = None, reference_field: str | None = None) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]¶
Read publishable merged items from backend.
mex.extractors.publisher.fields module¶
mex.extractors.publisher.filter module¶
- mex.extractors.publisher.filter.filter_persons_with_consent(person_items: list[MergedPerson], consent_items: list[MergedConsent]) list[MergedPerson]¶
Filter person items for having consent.
- Parameters:
person_items – list of persons
consent_items – list of consents
- Returns:
list of filtered persons without consent.
mex.extractors.publisher.main module¶
mex.extractors.publisher.settings module¶
- class mex.extractors.publisher.settings.PublisherSettings(*, skip_entity_types: list[str] = ['MergedPrimarySource', 'MergedConsent'], allowed_person_primary_sources: list[str] = ['endnote'])¶
Bases:
BaseModelSettings submodel definition for the publishing pipeline.
- allowed_person_primary_sources: list[str]¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- skip_entity_types: list[str]¶
mex.extractors.publisher.transform module¶
- mex.extractors.publisher.transform.get_unit_id_per_person(publisher_merged_persons: list[MergedPerson], publisher_contact_points_and_units: ItemsContainer[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]) dict[MergedPersonIdentifier, list[MergedOrganizationalUnitIdentifier]]¶
For each Person get their unit IDs if the unit has an email address.
- Parameters:
publisher_merged_persons – Merged Persons with primary source ldap
publisher_contact_points_and_units – Items container of units + contact points
- Returns:
dictionary of unit identifiers by person identifier
- mex.extractors.publisher.transform.update_actor_references_where_needed(item: MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup, allowed_actors: Collection[MergedAccessPlatformIdentifier | MergedActivityIdentifier | MergedBibliographicResourceIdentifier | MergedConsentIdentifier | MergedContactPointIdentifier | MergedDistributionIdentifier | MergedOrganizationalUnitIdentifier | MergedOrganizationIdentifier | MergedPersonIdentifier | MergedPrimarySourceIdentifier | MergedResourceIdentifier | MergedVariableGroupIdentifier | MergedVariableIdentifier], fallback_contact_identifiers: list[MergedContactPointIdentifier], fallback_unit_identifiers_by_person: dict[MergedPersonIdentifier, list[MergedOrganizationalUnitIdentifier]]) None¶
Update references to actors, where needed.
We filter all fields that allow Person references to only contain references to publishable actors. For fields that also allow organizational units, non-consenting persons can get replaced by their organizational unit if the unit provides an email address. Fields that allow contact points, but contain no valid references are set to a fallback contact point. Should the field be required, not allow contact points, but still contain no valid references, we keep the broken ones in order to keep mex-model compliance. Would we skip those items instead, we might break other items relying on the former item, and start a recursive de-publication process - which we don’t want.