mex.artificial package

Submodules

mex.artificial.helpers module

mex.artificial.helpers.create_faker(locale: str | Sequence[str] | dict[str, int | float] | None | list[str], seed: int | float | str | bytes | bytearray | None) Faker

Create and initialize a new faker instance with the given locale and seed.

mex.artificial.helpers.create_merged_items(extracted_items: list[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]

Create merged items for a list of extracted items.

mex.artificial.helpers.generate_artificial_extracted_items(locale: str | Sequence[str] | dict[str, int | float] | None, seed: int | float | str | bytes | bytearray | None, count: int, chattiness: int, stem_types: Sequence[str]) list[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]

Generate a list of artificial extracted items for the given settings.

mex.artificial.helpers.generate_artificial_merged_items(locale: str | Sequence[str] | dict[str, int | float] | None, seed: int | float | str | bytes | bytearray | None, count: int, chattiness: int, stem_types: Sequence[str]) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup]

Generate a list of artificial merged items for the given settings.

mex.artificial.helpers.register_factories(faker: Faker, identities: dict[str, list[Identity]], chattiness: int) None

Create faker providers and register them on each factory.

mex.artificial.helpers.write_merged_items(items: Iterable[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup], out_path: PathLike[str]) None

Write the incoming items into a new-line delimited JSON file.

mex.artificial.identity module

mex.artificial.identity.create_identities(faker: Faker, count: int) dict[str, list[Identity]]

Create the identities of the to-be-faked models.

We do this before actually creating the models, because we need to be able to set existing stableTargetIds on reference fields.

Parameters:
  • faker – Instance of faker

  • count – Number if identities to generate

Returns:

Dict with entity types and lists of Identities

mex.artificial.identity.create_numeric_ids(faker: Faker, count: int) dict[str, list[int]]

Create a mapping from entity type to a list of numeric ids.

These numeric ids can be used as seeds for the identity of artificial items. The seeds will be passed to Identifier.generate(seed=…) to get deterministic identifiers throughout consecutive runs of the artificial data generation.

Parameters:
  • faker – Instance of faker

  • count – Number of ids to generate

Returns:

Dict with entity types and lists of numeric ids

mex.artificial.identity.get_offset_int(cls: type) int

Calculate an integer based on the crc32 checksum of the name of a class.

mex.artificial.main module

mex.artificial.main.artificial(count: ~typing.Annotated[int, <typer.models.OptionInfo object at 0x7fb6a07b85d0>] = 100, chattiness: ~typing.Annotated[int, <typer.models.OptionInfo object at 0x7fb6a113d710>] = 10, seed: ~typing.Annotated[int, <typer.models.OptionInfo object at 0x7fb6a1422150>] = 0, locale: ~types.Annotated[list[str] | None, <typer.models.OptionInfo object at 0x7fb6a05a7d10>] = None, models: ~types.Annotated[list[str] | None, <typer.models.OptionInfo object at 0x7fb6a05a4310>] = None, path: ~types.Annotated[~pathlib.Path | None, <typer.models.OptionInfo object at 0x7fb6a1222390>] = None) None

Generate merged artificial items.

mex.artificial.main.main() None

Wrap entrypoint in typer.

mex.artificial.provider module

class mex.artificial.provider.BuilderProvider(generator: Any)

Bases: Provider

Faker provider that deals with interpreting pydantic model fields.

ensure_list(values: object) list[object]

Wrap single object in list, replace None with [] and return list as is.

extracted_items(stem_types: Sequence[str]) list[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]

Get a list of extracted items for the given model classes.

field_value(field: FieldInfo, identity: Identity) list[Any]

Get a single artificial value for the given field and identity.

get_random_field_info(field: FieldInfo) RandomFieldInfo

Randomly pick a matching type and patterns for a given field.

min_max_for_field(field: FieldInfo) tuple[int, int]

Return a min and max item count for a field.

class mex.artificial.provider.IdentityProvider(factory: Generator, identities: dict[str, list[Identity]])

Bases: BaseProvider

Faker provider that creates identities and helps with referencing them.

__init__(factory: Generator, identities: dict[str, list[Identity]]) None

Create and persist identities for all entity types.

identities(model: type[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]) list[Identity]

Return a list of identities for the given model class.

reference(inner_type: type[Identifier], exclude: Identity) Identifier | None

Return ID for random identity of given type (that is not excluded).

class mex.artificial.provider.LinkProvider(generator: Any)

Bases: Provider, Provider

Faker provider that can return links with optional title and language.

Return a link with optional title and language.

class mex.artificial.provider.NumerifyPatternsProvider(generator: Any)

Bases: Provider

Faker provider that tries to numerify a pattern until it matches a regex.

numerify_patterns(numerify_patterns: list[str], regex_patterns: list[str]) str | None

Try to numerify a pattern in 10 turns until it validates, or bail out.

class mex.artificial.provider.RandomFieldInfo(*, inner_type: Any, numerify_patterns: list[str] = [], regex_patterns: list[str] = [])

Bases: BaseModel

Randomized pick of matching inner type and patterns for a field.

inner_type: Any
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'inner_type': FieldInfo(annotation=Any, required=True), 'numerify_patterns': FieldInfo(annotation=list[str], required=False, default=[]), 'regex_patterns': FieldInfo(annotation=list[str], required=False, default=[])}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

numerify_patterns: list[str]
regex_patterns: list[str]
class mex.artificial.provider.TemporalEntityProvider(generator: Any)

Bases: Provider

Faker provider that can return a custom TemporalEntity with random precision.

temporal_entity(allowed_precision_levels: list[TemporalEntityPrecision]) TemporalEntity

Return a custom temporal entity with random date, time and precision.

class mex.artificial.provider.TextProvider(factory: Generator, chattiness: int)

Bases: Provider

Faker provider that handles custom text related requirements.

__init__(factory: Generator, chattiness: int) None

Configure the chattiness of generated text.

text_object() Text

Return a random text paragraph with an auto-detected language.

text_string() str

Return a randomized sequence of words as a string.

mex.artificial.types module

Module contents