mex.extractors.publisher package

Submodules

mex.extractors.publisher.extract module

mex.extractors.publisher.extract.get_publishable_merged_items(*, entity_type: list[str] | None = None, referenced_identifier: list[str] | None = None, reference_field: str | None = None) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]

Read publishable merged items from backend.

mex.extractors.publisher.fields module

mex.extractors.publisher.main module

mex.extractors.publisher.settings module

class mex.extractors.publisher.settings.PublisherSettings(*, skip_entity_types: list[str] = ['MergedPrimarySource', 'MergedConsent'], allowed_person_primary_sources: list[str] = ['endnote'])

Bases: BaseModel

Settings submodel definition for the publishing pipeline.

allowed_person_primary_sources: list[str]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'allowed_person_primary_sources': FieldInfo(annotation=list[str], required=False, default=['endnote'], description='Allow persons from these primary sources to be published.'), 'skip_entity_types': FieldInfo(annotation=list[str], required=False, default=['MergedPrimarySource', 'MergedConsent'], description='Skip publishing items with these types.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

skip_entity_types: list[str]

mex.extractors.publisher.transform module

mex.extractors.publisher.transform.get_unit_id_per_person(merged_ldap_persons: list[MergedPerson], publishable_contact_points_and_units: ItemsContainer[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]) dict[MergedPersonIdentifier, list[MergedOrganizationalUnitIdentifier]]

For each Person get their unit IDs if the unit has an email address.

Parameters:
  • merged_ldap_persons – Merged Persons with primary source ldap

  • publishable_contact_points_and_units – Items container of units + contact points

Returns:

dictionary of unit identifiers by person identifier

mex.extractors.publisher.transform.update_actor_references_where_needed(item: MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup, allowed_actors: Collection[MergedAccessPlatformIdentifier | MergedActivityIdentifier | MergedBibliographicResourceIdentifier | MergedConsentIdentifier | MergedContactPointIdentifier | MergedDistributionIdentifier | MergedOrganizationalUnitIdentifier | MergedOrganizationIdentifier | MergedPersonIdentifier | MergedPrimarySourceIdentifier | MergedResourceIdentifier | MergedVariableGroupIdentifier | MergedVariableIdentifier], fallback_contact_identifiers: list[MergedContactPointIdentifier], fallback_unit_identifiers_by_person: dict[MergedPersonIdentifier, list[MergedOrganizationalUnitIdentifier]]) None

Update references to actors, where needed.

We filter all fields that allow Person references to only contain references to publishable actors. For fields that also allow organizational units, non-consenting persons can get replaced by their organizational unit if the unit provides an email address. Fields that allow contact points, but contain no valid references are set to a fallback contact point. Should the field be required, not allow contact points, but still contain no valid references, we keep the broken ones in order to keep mex-model compliance. Would we skip those items instead, we might break other items relying on the former item, and start a recursive de-publication process - which we don’t want.

Module contents