mex.extractors.publisher package¶
Submodules¶
mex.extractors.publisher.extract module¶
- mex.extractors.publisher.extract.get_publishable_merged_items(*, entity_type: list[str] | None = None, referenced_identifier: list[str] | None = None, reference_field: str | None = None) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup] ¶
Read publishable merged items from backend.
mex.extractors.publisher.fields module¶
mex.extractors.publisher.main module¶
mex.extractors.publisher.settings module¶
- class mex.extractors.publisher.settings.PublisherSettings(*, skip_entity_types: list[str] = ['MergedPrimarySource', 'MergedConsent'], allowed_person_primary_sources: list[str] = ['endnote'])¶
Bases:
BaseModel
Settings submodel definition for the publishing pipeline.
- allowed_person_primary_sources: list[str]¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'allowed_person_primary_sources': FieldInfo(annotation=list[str], required=False, default=['endnote'], description='Allow persons from these primary sources to be published.'), 'skip_entity_types': FieldInfo(annotation=list[str], required=False, default=['MergedPrimarySource', 'MergedConsent'], description='Skip publishing items with these types.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- skip_entity_types: list[str]¶
mex.extractors.publisher.transform module¶
- mex.extractors.publisher.transform.get_unit_id_per_person(merged_ldap_persons: list[MergedPerson], publishable_contact_points_and_units: ItemsContainer[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]) dict[MergedPersonIdentifier, list[MergedOrganizationalUnitIdentifier]] ¶
For each Person get their unit IDs if the unit has an email address.
- Parameters:
merged_ldap_persons – Merged Persons with primary source ldap
publishable_contact_points_and_units – Items container of units + contact points
- Returns:
dictionary of unit identifiers by person identifier
- mex.extractors.publisher.transform.update_actor_references_where_needed(item: MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup, allowed_actors: Collection[MergedAccessPlatformIdentifier | MergedActivityIdentifier | MergedBibliographicResourceIdentifier | MergedConsentIdentifier | MergedContactPointIdentifier | MergedDistributionIdentifier | MergedOrganizationalUnitIdentifier | MergedOrganizationIdentifier | MergedPersonIdentifier | MergedPrimarySourceIdentifier | MergedResourceIdentifier | MergedVariableGroupIdentifier | MergedVariableIdentifier], fallback_contact_identifiers: list[MergedContactPointIdentifier], fallback_unit_identifiers_by_person: dict[MergedPersonIdentifier, list[MergedOrganizationalUnitIdentifier]]) None ¶
Update references to actors, where needed.
We filter all fields that allow Person references to only contain references to publishable actors. For fields that also allow organizational units, non-consenting persons can get replaced by their organizational unit if the unit provides an email address. Fields that allow contact points, but contain no valid references are set to a fallback contact point. Should the field be required, not allow contact points, but still contain no valid references, we keep the broken ones in order to keep mex-model compliance. Would we skip those items instead, we might break other items relying on the former item, and start a recursive de-publication process - which we don’t want.