mex.artificial package¶
Submodules¶
mex.artificial.helpers module¶
- mex.artificial.helpers.create_faker(locale: str | Sequence[str] | dict[str, int | float] | None | list[str], seed: int | float | str | bytes | bytearray | None) Faker ¶
Create and initialize a new faker instance with the given locale and seed.
- mex.artificial.helpers.create_merged_items(extracted_items: list[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup] ¶
Create merged items for a list of extracted items.
- mex.artificial.helpers.generate_artificial_extracted_items(locale: str | Sequence[str] | dict[str, int | float] | None, seed: int | float | str | bytes | bytearray | None, count: int, chattiness: int, stem_types: Sequence[str]) list[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup] ¶
Generate a list of artificial extracted items for the given settings.
- mex.artificial.helpers.generate_artificial_merged_items(locale: str | Sequence[str] | dict[str, int | float] | None, seed: int | float | str | bytes | bytearray | None, count: int, chattiness: int, stem_types: Sequence[str]) list[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup] ¶
Generate a list of artificial merged items for the given settings.
- mex.artificial.helpers.register_factories(faker: Faker, identities: dict[str, list[Identity]], chattiness: int) None ¶
Create faker providers and register them on each factory.
- mex.artificial.helpers.write_merged_items(items: Iterable[MergedAccessPlatform | MergedActivity | MergedBibliographicResource | MergedConsent | MergedContactPoint | MergedDistribution | MergedOrganization | MergedOrganizationalUnit | MergedPerson | MergedPrimarySource | MergedResource | MergedVariable | MergedVariableGroup], out_path: PathLike[str]) None ¶
Write the incoming items into a new-line delimited JSON file.
mex.artificial.identity module¶
- mex.artificial.identity.create_identities(faker: Faker, count: int) dict[str, list[Identity]] ¶
Create the identities of the to-be-faked models.
We do this before actually creating the models, because we need to be able to set existing stableTargetIds on reference fields.
- Parameters:
faker – Instance of faker
count – Number if identities to generate
- Returns:
Dict with entity types and lists of Identities
- mex.artificial.identity.create_numeric_ids(faker: Faker, count: int) dict[str, list[int]] ¶
Create a mapping from entity type to a list of numeric ids.
These numeric ids can be used as seeds for the identity of artificial items. The seeds will be passed to Identifier.generate(seed=…) to get deterministic identifiers throughout consecutive runs of the artificial data generation.
- Parameters:
faker – Instance of faker
count – Number of ids to generate
- Returns:
Dict with entity types and lists of numeric ids
- mex.artificial.identity.get_offset_int(cls: type) int ¶
Calculate an integer based on the crc32 checksum of the name of a class.
mex.artificial.main module¶
- mex.artificial.main.artificial(count: ~typing.Annotated[int, <typer.models.OptionInfo object at 0x7fb6a07b85d0>] = 100, chattiness: ~typing.Annotated[int, <typer.models.OptionInfo object at 0x7fb6a113d710>] = 10, seed: ~typing.Annotated[int, <typer.models.OptionInfo object at 0x7fb6a1422150>] = 0, locale: ~types.Annotated[list[str] | None, <typer.models.OptionInfo object at 0x7fb6a05a7d10>] = None, models: ~types.Annotated[list[str] | None, <typer.models.OptionInfo object at 0x7fb6a05a4310>] = None, path: ~types.Annotated[~pathlib.Path | None, <typer.models.OptionInfo object at 0x7fb6a1222390>] = None) None ¶
Generate merged artificial items.
- mex.artificial.main.main() None ¶
Wrap entrypoint in typer.
mex.artificial.provider module¶
- class mex.artificial.provider.BuilderProvider(generator: Any)¶
Bases:
Provider
Faker provider that deals with interpreting pydantic model fields.
- ensure_list(values: object) list[object] ¶
Wrap single object in list, replace None with [] and return list as is.
- extracted_items(stem_types: Sequence[str]) list[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup] ¶
Get a list of extracted items for the given model classes.
- field_value(field: FieldInfo, identity: Identity) list[Any] ¶
Get a single artificial value for the given field and identity.
- get_random_field_info(field: FieldInfo) RandomFieldInfo ¶
Randomly pick a matching type and patterns for a given field.
- min_max_for_field(field: FieldInfo) tuple[int, int] ¶
Return a min and max item count for a field.
- class mex.artificial.provider.IdentityProvider(factory: Generator, identities: dict[str, list[Identity]])¶
Bases:
BaseProvider
Faker provider that creates identities and helps with referencing them.
- __init__(factory: Generator, identities: dict[str, list[Identity]]) None ¶
Create and persist identities for all entity types.
- identities(model: type[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]) list[Identity] ¶
Return a list of identities for the given model class.
- reference(inner_type: type[Identifier], exclude: Identity) Identifier | None ¶
Return ID for random identity of given type (that is not excluded).
- class mex.artificial.provider.LinkProvider(generator: Any)¶
Bases:
Provider
,Provider
Faker provider that can return links with optional title and language.
- link() Link ¶
Return a link with optional title and language.
- class mex.artificial.provider.NumerifyPatternsProvider(generator: Any)¶
Bases:
Provider
Faker provider that tries to numerify a pattern until it matches a regex.
- numerify_patterns(numerify_patterns: list[str], regex_patterns: list[str]) str | None ¶
Try to numerify a pattern in 10 turns until it validates, or bail out.
- class mex.artificial.provider.RandomFieldInfo(*, inner_type: Any, numerify_patterns: list[str] = [], regex_patterns: list[str] = [])¶
Bases:
BaseModel
Randomized pick of matching inner type and patterns for a field.
- inner_type: Any¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'inner_type': FieldInfo(annotation=Any, required=True), 'numerify_patterns': FieldInfo(annotation=list[str], required=False, default=[]), 'regex_patterns': FieldInfo(annotation=list[str], required=False, default=[])}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- numerify_patterns: list[str]¶
- regex_patterns: list[str]¶
- class mex.artificial.provider.TemporalEntityProvider(generator: Any)¶
Bases:
Provider
Faker provider that can return a custom TemporalEntity with random precision.
- temporal_entity(allowed_precision_levels: list[TemporalEntityPrecision]) TemporalEntity ¶
Return a custom temporal entity with random date, time and precision.
- class mex.artificial.provider.TextProvider(factory: Generator, chattiness: int)¶
Bases:
Provider
Faker provider that handles custom text related requirements.
- __init__(factory: Generator, chattiness: int) None ¶
Configure the chattiness of generated text.
- text_object() Text ¶
Return a random text paragraph with an auto-detected language.
- text_string() str ¶
Return a randomized sequence of words as a string.