mex.common package¶

Subpackages¶

Submodules¶

mex.common.cli module¶

mex.common.cli._callback(func: Callable[[], None], settings_cls: type[BaseSettings] | None, **cli_settings: object) → None¶

Run the decorated function in the current click context.

Parameters:

func – Entry point function for a cli
settings_cls – Base settings class or a subclass of it
cli_settings – Parsed cli options in raw format

Raises:

Exception – Any uncaught exception
SysExit – With exit code 0 or 1

mex.common.cli.entrypoint(settings_cls: type[BaseSettings] | None = None) → Callable[[Callable[[], None]], Command]¶

Decorate given function to mark it as a cli entrypoint.

Running an entrypoint will print a summary on startup, provide error handling, close connectors on shutdown and adds –pdb post mortem debugging.

Parameters:: settings_cls – Settings class (deprecated).
Returns:: The decorated function
Return type:: Callable

mex.common.context module¶

class mex.common.context.SingleSingletonStore¶

Bases: Generic[_SingletonT]

Thin wrapper for storing a single thread-local singleton.

Stores only a single instance. Requested class must either be the same or a parent class of the stored class.

__init__() → None¶: Create a new settings singleton store.

load(cls: type[_SingletonT]) → _SingletonT¶: Retrieve the settings for the given class or create a new one.

push(instance: _SingletonT) → None¶: Set or replace a singleton instance in the store.

reset() → None¶: Remove singleton instance from the store.

class mex.common.context.SingletonStore¶

Bases: Generic[_SingletonT]

Thin wrapper for storing thread-local singletons.

__init__() → None¶: Create a new singleton store with the given type.

load(cls: type[_SingletonT]) → _SingletonT¶: Retrieve a singleton for the given class or create a new one.

pop(cls: type[_SingletonT]) → _SingletonT¶: Remove a singleton for the given class for the store and return it.

push(instance: _SingletonT) → None¶: Set or replace a singleton instance in the store.

reset() → None¶: Remove all singleton instances from the store.

mex.common.exceptions module¶

exception mex.common.exceptions.EmptySearchResultError¶

Bases: MExError

Empty search result.

exception mex.common.exceptions.FoundMoreThanOneError¶

Bases: MExError

Found more than one.

exception mex.common.exceptions.MExError¶

Bases: Exception

Base class for generic exceptions.

exception mex.common.exceptions.MergingError¶

Bases: MExError

Creating a merged item from extracted items and rules failed.

exception mex.common.exceptions.TimedReadTimeout(*args: Any, **kwargs: Any)¶

Bases: TimedRequestException

The server did not send any data in the allotted amount of time.

exception mex.common.exceptions.TimedRequestException(*args: Any, **kwargs: Any)¶

Bases: RequestException

Timed request exception with a seconds attribute.

__init__(*args: Any, **kwargs: Any) → None¶: Initialize exception with timeout seconds.

classmethod create(exc: RequestException, t0: float) → Self¶: Create a new timed error from an upstream exception and start time.

seconds: float¶

exception mex.common.exceptions.TimedServerError(*args: Any, **kwargs: Any)¶

Bases: TimedRequestException

The server encountered an error or is incapable of responding.

exception mex.common.exceptions.TimedTooManyRequests(*args: Any, **kwargs: Any)¶

Bases: TimedRequestException

Client sent too many requests in a given time.

mex.common.extract module¶

mex.common.extract.get_dtypes_for_model(model: type[BaseModel]) → dict[str, Dtype]¶

Get the basic dtypes per field for a model from the PANDAS_DTYPE_MAP.

Parameters:: model – Model class for which to get pandas data types per field alias
Returns:: Mapping from field alias to dtype strings

mex.common.extract.parse_csv(path_or_buffer: str | PathLike[str] | ReadCsvBuffer[Any], into: type[_BaseModelT], chunksize: int = 10000, summary_batch_size: int = 10000, **kwargs: Any) → Generator[_BaseModelT, None, None]¶

Parse a CSV file into an iterable of the given model type.

Parameters:

path_or_buffer – Location of CSV file or read buffer with CSV content
into – Type of model to parse
chunksize – Buffer size for chunked reading
summary_batch_size – Batch size for summary logs
kwargs – Additional keywords arguments for pandas

Returns:

Generator for models

mex.common.fields module¶

mex.common.logging module¶

mex.common.logging.watch(log_interval: int = 10000) → Callable[[Callable[[P], Generator[_YieldT, None, None]]], Callable[[P], Generator[_YieldT, None, None]]]¶

Watch the output of a generator function and log the yielded items.

Parameters:

func – Generator function that yields strings, models or exceptions (It will use the objects __str__() method to print it out.)
log_interval – integer determining the interval length between loggings

Returns:

Decorated function that logs all yielded items

mex.common.settings module¶

class mex.common.settings.BaseSettings(_env_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | Path | None = None, *, MEX_DEBUG: bool = False, MEX_SINK: list[Sink] = [Sink.NDJSON], MEX_ASSETS_DIR: Path = PosixPath('/home/runner/work/mex-common/mex-common/assets'), MEX_OPS_DIR: Path = PosixPath('/home/runner/work/mex-common/mex-common/ops'), MEX_WORK_DIR: Path = PosixPath('/home/runner/work/mex-common/mex-common'), MEX_IDENTITY_PROVIDER: IdentityProvider = IdentityProvider.MEMORY, MEX_BACKEND_API_URL: HttpUrl = HttpUrl('http://localhost:8080/'), MEX_BACKEND_API_KEY: SecretStr = SecretStr('**********'), MEX_BACKEND_API_PARALLELIZATION: int = 1, MEX_BACKEND_API_CHUNK_SIZE: int = 25, MEX_VERIFY_SESSION: bool | OpsPath = True, MEX_ORGANIGRAM_PATH: AssetsPath = AssetsPath('raw-data/organigram/organizational_units.json'), MEX_PRIMARY_SOURCES_PATH: AssetsPath = AssetsPath('raw-data/primary-sources/primary-sources.json'), MEX_LDAP_URL: SecretStr = SecretStr('**********'), MEX_LDAP_SEARCH_BASE: str = 'dc=ldapmock,dc=local', MEX_WIKI_API_URL: HttpUrl = HttpUrl('http://wikidata/'), MEX_WEB_USER_AGENT: str = 'rki/mex', MEX_ORCID_API_URL: HttpUrl = HttpUrl('https://orcid/'))¶

Bases: BaseSettings

Common settings definition class.

Settings are accessed through a singleton instance of a pydantic settings class. The singleton instance can be loaded lazily by calling BaseSettings.get().

The base settings should only contain options, that are used by common code. To add more configuration options for a specific subsystem, create a new subclass and define the required fields there. To load a singleton for that subclass, simply call SubsystemSettings.get().

All configuration options should have a speaking name and a clear description. The defaults should be set to a value that works with unit tests and must not contain any secrets or live URLs that would break unit test isolation.

__init__(_env_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | Path | None = None, **values: Any) → None¶: Construct a new settings instance.

assets_dir: Path¶

backend_api_chunk_size: int¶

backend_api_key: SecretStr¶

backend_api_parallelization: int¶

backend_api_url: HttpUrl¶

debug: bool¶

classmethod get() → Self¶

Get the current settings instance from singleton store.

Returns:: An instance of BaseSettings or a subclass thereof

classmethod get_env_name(name: str) → str¶

Get the name of the environment variable for field with given name.

Resolves the actual environment variable name that would be used for a given field, taking into account case sensitivity and environment prefix configuration.

Parameters:: name – The field name to get the environment variable name for.
Returns:: The uppercase environment variable name that maps to the field.

identity_provider: IdentityProvider¶

ldap_search_base: str¶

ldap_url: SecretStr¶

log_settings() → Self¶: Validator that logs the settings in text form.

mex_web_user_agent: str¶

model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': False, 'env_nested_delimiter': '__', 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'mex_', 'env_prefix_target': 'variable', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'populate_by_name': True, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

no_settings_as_attributes() → Self¶: Validate that no attribute inherits from pydantic.BaseSettings.

ops_dir: Path¶

orcid_api_url: HttpUrl¶

organigram_path: AssetsPath¶

primary_sources_path: AssetsPath¶

resolve_paths() → Self¶: Resolve AssetPath, OpsPath, and WorkPath.

sink: list[Sink]¶

text() → str¶

Dump the current settings into a readable table.

Returns:: Formatted table with all settings.

verify_session: bool | OpsPath¶

wiki_api_url: HttpUrl¶

work_dir: Path¶

mex.common.sorters module¶

mex.common.sorters.topological_sort(items: list[ItemT], primary_key: str, *, parent_key: str | None = None, child_key: str | None = None) → None¶

Sort the given list of items in-place according to their topology.

Items can refer to each other using key fields. A parent item can reference a child item by storing the child’s primary_key in the parent’s child_key field. Similarly, a child can reference its parent using the parent_key field.

This can be useful for submitting items to the backend in the correct order.

mex.common.transform module¶

class mex.common.transform.MExEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶

Bases: JSONEncoder

Custom JSON encoder that can handle pydantic models, enums and UUIDs.

default(obj: object) → object¶: Implement custom serialization rules.

mex.common.transform.camel_to_split(string: str) → str¶: Convert the given string from CamelCase into Split Case.

mex.common.transform.camelcase_to_title(value: str) → str¶: Convert the given string from CamelCase into Title case.

mex.common.transform.clean_dict(obj: Any, unwanted: Sequence[Any] = (None, [])) → Any¶: Clean None and [] from dicts.

mex.common.transform.dromedary_to_kebab(string: str) → str¶: Convert the given string from dromedaryCase into kebab-case.

mex.common.transform.dromedary_to_snake(string: str) → str¶: Convert the given string from dromedaryCase into snake_case.

mex.common.transform.ensure_postfix(string_like: object, postfix: object) → str¶

Return a string with the given postfix appended if it is not present yet.

If string_like already ends with the postfix, return a stringified copy. This method is the inverse of str.removepostfix.

Parameters:

string_like – Object to convert to string and potentially postfix.
postfix – Object to convert to string and use as postfix.

Returns:

String with the postfix guaranteed to be present at the end.

mex.common.transform.ensure_prefix(string_like: object, prefix: object) → str¶

Return a string with the given prefix prepended if it is not present yet.

If string_like already starts with the prefix, return a stringified copy. This method is the inverse of str.removeprefix.

Parameters:

string_like – Object to convert to string and potentially prefix.
prefix – Object to convert to string and use as prefix.

Returns:

String with the prefix guaranteed to be present at the beginning.

mex.common.transform.kebab_to_camel(string: str) → str¶: Convert the given string from kebab-case into CamelCase.

mex.common.transform.normalize(string: str) → str¶: Normalize the given string to lowercase, numerals and single spaces.

mex.common.transform.snake_to_dromedary(string: str) → str¶: Convert the given string from snake_case into dromedaryCase.

mex.common.transform.split_to_camel(string: str) → str¶: Convert the given string from Split Case into CamelCase.

mex.common.transform.split_to_caps(string: str) → str¶: Convert the given string from Split case into CAPS_CASE.

mex.common.transform.to_key_and_values(dct: dict[str, Any]) → Iterable[tuple[str, list[Any]]]¶

Return an iterable of dictionary items where the values are always lists.

Normalizes dictionary values by converting single values to single-item lists, leaving existing lists unchanged, and converting None to empty lists.

Parameters:: dct – Dictionary to normalize the values of.
Yields:: Tuples of (key, list_value) where list_value is guaranteed to be a list.

mex.common.utils module¶

class mex.common.utils.GenericFieldInfo(alias: str | None, annotation: type[Any] | None, frozen: bool)¶

Bases: object

Abstraction class for unifying FieldInfo and ComputedFieldInfo objects.

alias: str | None¶

annotation: type[Any] | None¶

frozen: bool¶

mex.common.utils.any_contains_any(bases: Iterable[Container[T] | None], tokens: Iterable[T]) → bool¶: Check if any of the given bases contains any of the given tokens.

mex.common.utils.contains_any(base: Container[T], tokens: Iterable[T]) → bool¶: Check if a given base contains any of the given tokens.

mex.common.utils.contains_any_types(field: GenericFieldInfo, *types: type) → bool¶

Return whether a field is annotated as any of the given types.

Unions, lists and type annotations are checked for their inner types and only the non-NoneType types are considered for the type-check.

Parameters:

field – A GenericFieldInfo instance
types – Types to look for in the field’s annotation

Returns:

Whether the field contains any of the given types

mex.common.utils.contains_only_types(field: GenericFieldInfo, *types: type) → bool¶

Return whether a field is annotated as one of the given types.

Unions, lists and type annotations are checked for their inner types and only the non-NoneType types are considered for the type-check.

Parameters:

field – A GenericFieldInfo instance
types – Types to look for in the field’s annotation

Returns:

Whether the field contains any of the given types

mex.common.utils.deprecated(old_name: str, new_func: Callable[[P], R]) → Callable[[P], R]¶: Create a deprecated wrapper for a function.

mex.common.utils.ensure_list(values: list[T] | T | None) → list[T]¶: Put objects in lists, replace None with an empty list and return lists as is.

mex.common.utils.get_alias_lookup(model: type[BaseModel]) → dict[str, str]¶

Build a cached mapping from field alias to field names.

Creates a dictionary that maps field aliases (or field names if no alias exists) back to the actual field names. This is useful for resolving field references when working with serialized data that may use aliases.

Parameters:: model – The Pydantic model class to build the alias lookup for.
Returns:: Dictionary mapping field aliases (or names) to actual field names.

mex.common.utils.get_all_fields(model: type[BaseModel]) → dict[str, GenericFieldInfo]¶

Return a combined dict of defined and computed fields of a given model.

This function combines both regular model fields and computed fields into a single dictionary using the GenericFieldInfo abstraction. Results are cached for performance.

Parameters:: model – The Pydantic model class to extract fields from.
Returns:: Dictionary mapping field names to GenericFieldInfo objects for all fields (both regular and computed) in the model.

mex.common.utils.get_field_names_allowing_none(model: type[BaseModel]) → list[str]¶

Build a cached list of fields that can be set to None.

Tests each field’s annotation by attempting to validate None against it. Fields that accept None without raising a ValidationError are considered nullable fields.

Parameters:: model – The Pydantic model class to analyze.
Returns:: List of field names that accept None as a valid value.

mex.common.utils.get_inner_types(annotation: Any, include_none: bool = True, unpack_list: bool = True, unpack_literal: bool = True) → Generator[type, None, None]¶

Recursively yield all inner types from a given type annotation.

Parameters:

annotation – The type annotation to process
include_none – Whether to include NoneTypes in output
unpack_list – Whether to unpack list types
unpack_literal – Whether to unpack Literal types

Returns:

All inner types found within the annotation

mex.common.utils.get_list_field_names(model: type[BaseModel]) → list[str]¶

Build a cached list of fields that look like lists.

Analyzes the model’s field annotations to identify fields that are list types. This includes direct list annotations and list types within unions.

Parameters:: model – The Pydantic model class to analyze.
Returns:: List of field names that have list-like type annotations.

mex.common.utils.group_fields_by_class_name(model_classes_by_name: Mapping[str, type[BaseModel]], predicate: Callable[[GenericFieldInfo], bool]) → dict[str, list[str]]¶

Group the field names by model class and filter them by the given predicate.

For each model class, extracts all fields and applies the predicate function to filter them. Returns a mapping from class names to lists of field names that satisfy the predicate condition.

Parameters:

model_classes_by_name – Map from class names to model classes.
predicate – Function to filter the fields of the classes by.

Returns:

Dictionary mapping class names to a list of field names filtered by predicate.

mex.common.utils.grouper(chunk_size: int, iterable: Iterable[T]) → Iterator[Iterable[T | None]]¶

Collect data into fixed-length chunks or blocks.

Groups items from an iterable into fixed-size chunks. The last chunk may be padded with None values if the total number of items is not evenly divisible by the chunk size.

Parameters:

chunk_size – The size of each chunk.
iterable – The iterable to group into chunks.

Returns:

Iterator of iterables, each containing chunk_size items (with None padding for the final chunk if necessary).

mex.common.utils.jitter_sleep(min_seconds: float, jitter_seconds: float) → None¶

Sleep a random amount of seconds within the given parameters.

Parameters:

min_seconds – The minimum time to sleep
jitter_seconds – The variable sleep time added to the minimum

mex.common.utils.random() → x in the interval [0, 1).¶

mex.common package¶

Subpackages¶

Submodules¶

mex.common.cli module¶

mex.common.context module¶

mex.common.exceptions module¶

mex.common.extract module¶

mex.common.fields module¶

mex.common.logging module¶

mex.common.settings module¶

mex.common.sorters module¶

mex.common.transform module¶

mex.common.utils module¶

Module contents¶

mex-common

Navigation

Related Topics