mex.common.models.base package

Submodules

mex.common.models.base.entity module

class mex.common.models.base.entity.BaseEntity

Bases: BaseModel

Abstract base model for extracted data, merged item and rule set classes.

This class gives type hints for an identifier field, the frozen entityType field and the frozen class variable stemType. Subclasses should implement all three fields while setting the correct identifier type as well as the correct literal values for the entity and stem types.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

mex.common.models.base.extracted_data module

class mex.common.models.base.extracted_data.ExtractedData(*, hadPrimarySource: MergedPrimarySourceIdentifier, identifierInPrimarySource: Annotated[str, MinLen(min_length=1), MaxLen(max_length=1000), _PydanticGeneralMetadata(pattern='^[^\\n\\r]+$')])

Bases: BaseEntity

Base model for all extracted data classes.

This class adds two important attributes for metadata provenance: hadPrimarySource and identifierInPrimarySource, which are used to uniquely identify an item in its original primary source. The attribute stableTargetId has to be set by each concrete subclass, like ExtractedPerson, because it needs to have the correct type, e.g. MergedPersonIdentifier.

This class also adds a validator to automatically set identifiers for provenance. See below, for a full description.

_get_identifier(identifier_type: type[_ExtractedIdentifierT]) _ExtractedIdentifierT

Consult the identity provider to get the identifier for this item.

Parameters:

identifier_type – ExtractedIdentifier-subclass to cast the identifier to

Returns:

Identifier of the correct type

_get_stable_target_id(identifier_type: type[_MergedIdentifierT]) _MergedIdentifierT

Consult the identity provider to get the stableTargetId for this item.

Parameters:

identifier_type – MergedIdentifier-subclass to cast the identifier to

Returns:

StableTargetId of the correct type

entityType: str
hadPrimarySource: Annotated[MergedPrimarySourceIdentifier, FieldInfo(annotation=NoneType, required=True, description='The stableTargetId of the primary source, that this item was extracted from. This field is mandatory for all extracted items to aid with data provenance. Extracted primary sources also have this field and are all extracted from a static primary source for MEx. The extracted primary source for MEx has its own merged item as a primary source.', frozen=True)]
identifierInPrimarySource: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='This is the identifier the original item had in its source system. It is only unique amongst items coming from the same system, because identifier formats are likely to overlap between systems. The value for `identifierInPrimarySource` is therefore only unique in composition with `hadPrimarySource`. MEx uses this composite key to assign a stable and globally unique `identifier` per extracted item.', examples=['123456', 'item-501', 'D7/x4/zz.final3'], frozen=True, metadata=[MinLen(min_length=1), MaxLen(max_length=1000), _PydanticGeneralMetadata(pattern='^[^\\n\\r]+$')])]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'hadPrimarySource': FieldInfo(annotation=MergedPrimarySourceIdentifier, required=True, description='The stableTargetId of the primary source, that this item was extracted from. This field is mandatory for all extracted items to aid with data provenance. Extracted primary sources also have this field and are all extracted from a static primary source for MEx. The extracted primary source for MEx has its own merged item as a primary source.', frozen=True), 'identifierInPrimarySource': FieldInfo(annotation=str, required=True, description='This is the identifier the original item had in its source system. It is only unique amongst items coming from the same system, because identifier formats are likely to overlap between systems. The value for `identifierInPrimarySource` is therefore only unique in composition with `hadPrimarySource`. MEx uses this composite key to assign a stable and globally unique `identifier` per extracted item.', examples=['123456', 'item-501', 'D7/x4/zz.final3'], frozen=True, metadata=[MinLen(min_length=1), MaxLen(max_length=1000), _PydanticGeneralMetadata(pattern='^[^\\n\\r]+$')])}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

stemType: ClassVar

mex.common.models.base.field_info module

class mex.common.models.base.field_info.GenericFieldInfo(alias: str | None, annotation: type[Any] | None, frozen: bool)

Bases: object

Abstraction class for unifying FieldInfo and ComputedFieldInfo objects.

alias: str | None
annotation: type[Any] | None
frozen: bool

mex.common.models.base.filter module

class mex.common.models.base.filter.EntityFilter(*, fieldInPrimarySource: str, locationInPrimarySource: str | None = None, examplesInPrimarySource: list[str] | None = None, mappingRules: Annotated[list[EntityFilterRule], MinLen(min_length=1)], comment: str | None = None)

Bases: BaseModel

Entity filter model.

comment: str | None
examplesInPrimarySource: list[str] | None
fieldInPrimarySource: str
locationInPrimarySource: str | None
mappingRules: Annotated[list[EntityFilterRule], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'comment': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'examplesInPrimarySource': FieldInfo(annotation=Union[list[str], NoneType], required=False, default=None), 'fieldInPrimarySource': FieldInfo(annotation=str, required=True), 'locationInPrimarySource': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'mappingRules': FieldInfo(annotation=list[EntityFilterRule], required=True, metadata=[MinLen(min_length=1)])}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class mex.common.models.base.filter.EntityFilterRule(*, forValues: list[str] | None = None, rule: str | None = None)

Bases: BaseModel

Entity filter rule model.

forValues: list[str] | None
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'forValues': FieldInfo(annotation=Union[list[str], NoneType], required=False, default=None), 'rule': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

rule: str | None
mex.common.models.base.filter.generate_entity_filter_schema(extracted_model: type[AnyExtractedModel]) type[BaseModel]

Create a mapping schema for an entity filter for an extracted model class.

Example entity filter: If activity starts before 2016: do not extract.

Parameters:

extracted_model – a pydantic model for an extracted model class

Returns:

model of the mapping schema for an entity filter

mex.common.models.base.mapping module

class mex.common.models.base.mapping.GenericField(*, fieldInPrimarySource: str, locationInPrimarySource: str | None = None, examplesInPrimarySource: list[str] | None = None, mappingRules: Annotated[list[GenericRule], MinLen(min_length=1)], comment: str | None = None)

Bases: BaseModel

Generic Field model.

comment: str | None
examplesInPrimarySource: list[str] | None
fieldInPrimarySource: str
locationInPrimarySource: str | None
mappingRules: Annotated[list[GenericRule], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'comment': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'examplesInPrimarySource': FieldInfo(annotation=Union[list[str], NoneType], required=False, default=None), 'fieldInPrimarySource': FieldInfo(annotation=str, required=True), 'locationInPrimarySource': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'mappingRules': FieldInfo(annotation=list[GenericRule], required=True, metadata=[MinLen(min_length=1)])}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class mex.common.models.base.mapping.GenericRule(*, forValues: list[str] | None = None, setValues: list[Any] | None = None, rule: str | None = None)

Bases: BaseModel

Generic mapping rule model.

forValues: list[str] | None
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'forValues': FieldInfo(annotation=Union[list[str], NoneType], required=False, default=None), 'rule': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'setValues': FieldInfo(annotation=Union[list[Any], NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

rule: str | None
setValues: list[Any] | None
mex.common.models.base.mapping.generate_mapping_schema(extracted_model: type[AnyExtractedModel]) type[BaseModel]

Create a mapping schema the MEx extracted model class.

Pydantic models are dynamically created for the given entity type from depending on the respective fields and their types.

Parameters:

extracted_model – a pydantic model for an extracted model class

Returns:

dynamic mapping model for the provided extracted model class

mex.common.models.base.merged_item module

class mex.common.models.base.merged_item.MergedItem

Bases: BaseEntity

Base model for all merged item classes.

entityType: str
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

stemType: ClassVar

mex.common.models.base.model module

class mex.common.models.base.model.BaseModel

Bases: BaseModel

Common base class for all MEx model classes.

classmethod _convert_list_to_non_list(field_name: str, value: list[Any]) Any

Convert a list value to a non-list value by unpacking it if possible.

classmethod _convert_non_list_to_list(field_name: str, value: Any) list[Any] | None

Convert a non-list value to a list value by wrapping it in a list.

classmethod _fix_value_listyness_for_field(field_name: str, value: Any) Any

Check actual and desired shape of a value and fix it if necessary.

classmethod _get_alias_lookup() dict[str, str]

Build a cached mapping from field alias to field names.

classmethod _get_field_names_allowing_none() list[str]

Build a cached list of fields can be set to None.

classmethod _get_list_field_names() list[str]

Build a cached list of fields that look like lists.

checksum() str

Calculate md5 checksum for this model.

classmethod fix_listyness(data: Any, handler: ValidatorFunctionWrapHandler) Any

Adjust the listyness of to-be-parsed data to match the desired shape.

If that data is a Mapping and the model defines a list[T] field but the raw data contains just a value of type T, it will be wrapped into a list. If the raw data contains a literal None, but the list field is defined as required, we substitute an empty list.

If the model does not expect a list, but the raw data contains a list with no entries, it will be substituted with None. If the raw data contains exactly one entry, then it will be unpacked from the list. If it contains more than one entry however, an error is raised, because we would not know which to choose.

Parameters:
  • data – Raw data or instance to be parsed

  • handler – Validator function wrap handler

Returns:

data with fixed list shapes

classmethod get_all_fields() dict[str, GenericFieldInfo]

Return a combined dict of defined and computed fields.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

classmethod model_json_schema(by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'mex.common.models.base.schema.JsonSchemaGenerator'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation') dict[str, Any]

Generates a JSON schema for a model class.

Parameters:
  • by_alias – Whether to use attribute aliases or not.

  • ref_template – The reference template.

  • schema_generator – Overriding the logic used to generate the JSON schema

  • mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod verify_computed_field_consistency(data: Any, handler: ValidatorFunctionWrapHandler) Any

Validate that parsed values for computed fields are consistent.

Parsing a dictionary with a value for a computed field that is consistent with what that field would have computed anyway is allowed. Omitting values for computed fields is perfectly valid as well. However, if the parsed value is different from the computed value, a validation error is raised.

Parameters:
  • data – Raw data or instance to be parsed

  • handler – Validator function wrap handler

Returns:

data with consistent computed fields.

mex.common.models.base.rules module

class mex.common.models.base.rules.AdditiveRule

Bases: BaseEntity

Base rule to add values to merged items.

entityType: str
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

stemType: ClassVar
class mex.common.models.base.rules.PreventiveRule

Bases: BaseEntity

Base rule to prevent primary sources for fields of merged items.

entityType: str
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

stemType: ClassVar
class mex.common.models.base.rules.RuleSet

Bases: BaseEntity

Base class for a set of an additive, subtractive and preventive rule.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

class mex.common.models.base.rules.SubtractiveRule

Bases: BaseEntity

Base rule to subtract values from merged items.

entityType: str
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

stemType: ClassVar

mex.common.models.base.schema module

class mex.common.models.base.schema.JsonSchemaGenerator(by_alias: bool = True, ref_template: str = '#/$defs/{model}')

Bases: GenerateJsonSchema

Customization of the pydantic class for generating JSON schemas.

handle_ref_overrides(json_schema: Dict[str, Any]) Dict[str, Any]

Disable pydantic behavior to wrap top-level $ref keys in an allOf.

For example, pydantic would convert

{“$ref”: “#/$defs/APIType”, “examples”: [“api-type-1”]}

into

{“allOf”: {“$ref”: “#/$defs/APIType”}, “examples”: [“api-type-1”]}

which is in fact recommended by JSON schema, but we need to disable this to stay compatible with mex-editor and mex-model.

Module contents