mex.common package

Subpackages

Submodules

mex.common.cli module

mex.common.cli._callback(func: Callable[[], None], settings_cls: type[BaseSettings] | None, **cli_settings: object) None

Run the decorated function in the current click context.

Parameters:
  • func – Entry point function for a cli

  • settings_cls – Base settings class or a subclass of it

  • cli_settings – Parsed cli options in raw format

Raises:
  • Exception – Any uncaught exception

  • SysExit – With exit code 0 or 1

mex.common.cli.entrypoint(settings_cls: type[BaseSettings] | None = None) Callable[[Callable[[], None]], Command]

Decorate given function to mark it as a cli entrypoint.

Running an entrypoint will print a summary on startup, provide error handling, close connectors on shutdown and adds –pdb post mortem debugging.

Parameters:

settings_cls – Settings class (deprecated).

Returns:

The decorated function

Return type:

Callable

mex.common.context module

class mex.common.context.SingleSingletonStore

Bases: Generic[_SingletonT]

Thin wrapper for storing a single thread-local singleton.

Stores only a single instance. Requested class must either be the same or a parent class of the stored class.

__init__() None

Create a new settings singleton store.

load(cls: type[_SingletonT]) _SingletonT

Retrieve the settings for the given class or create a new one.

push(instance: _SingletonT) None

Set or replace a singleton instance in the store.

reset() None

Remove singleton instance from the store.

class mex.common.context.SingletonStore

Bases: Generic[_SingletonT]

Thin wrapper for storing thread-local singletons.

__init__() None

Create a new singleton store with the given type.

load(cls: type[_SingletonT]) _SingletonT

Retrieve a singleton for the given class or create a new one.

pop(cls: type[_SingletonT]) _SingletonT

Remove a singleton for the given class for the store and return it.

push(instance: _SingletonT) None

Set or replace a singleton instance in the store.

reset() None

Remove all singleton instances from the store.

mex.common.exceptions module

exception mex.common.exceptions.EmptySearchResultError

Bases: MExError

Empty search result.

exception mex.common.exceptions.FoundMoreThanOneError

Bases: MExError

Found more than one.

exception mex.common.exceptions.MExError

Bases: Exception

Base class for generic exceptions.

exception mex.common.exceptions.MergingError

Bases: MExError

Creating a merged item from extracted items and rules failed.

exception mex.common.exceptions.TimedReadTimeout(*args: Any, **kwargs: Any)

Bases: TimedRequestException

The server did not send any data in the allotted amount of time.

exception mex.common.exceptions.TimedRequestException(*args: Any, **kwargs: Any)

Bases: RequestException

Timed request exception with a seconds attribute.

__init__(*args: Any, **kwargs: Any) None

Initialize exception with timeout seconds.

classmethod create(exc: RequestException, t0: float) Self

Create a new timed error from an upstream exception and start time.

seconds: float
exception mex.common.exceptions.TimedServerError(*args: Any, **kwargs: Any)

Bases: TimedRequestException

The server encountered an error or is incapable of responding.

exception mex.common.exceptions.TimedTooManyRequests(*args: Any, **kwargs: Any)

Bases: TimedRequestException

Client sent too many requests in a given time.

mex.common.extract module

mex.common.extract.get_dtypes_for_model(model: type[BaseModel]) dict[str, Dtype]

Get the basic dtypes per field for a model from the PANDAS_DTYPE_MAP.

Parameters:

model – Model class for which to get pandas data types per field alias

Returns:

Mapping from field alias to dtype strings

mex.common.extract.parse_csv(path_or_buffer: str | PathLike[str] | ReadCsvBuffer[Any], into: type[_BaseModelT], chunksize: int = 10000, summary_batch_size: int = 10000, **kwargs: Any) Generator[_BaseModelT, None, None]

Parse a CSV file into an iterable of the given model type.

Parameters:
  • path_or_buffer – Location of CSV file or read buffer with CSV content

  • into – Type of model to parse

  • chunksize – Buffer size for chunked reading

  • summary_batch_size – Batch size for summary logs

  • kwargs – Additional keywords arguments for pandas

Returns:

Generator for models

mex.common.fields module

mex.common.logging module

mex.common.logging.watch(log_interval: int = 10000) Callable[[Callable[[P], Generator[_YieldT, None, None]]], Callable[[P], Generator[_YieldT, None, None]]]

Watch the output of a generator function and log the yielded items.

Parameters:
  • func – Generator function that yields strings, models or exceptions (It will use the objects __str__() method to print it out.)

  • log_interval – integer determining the interval length between loggings

Returns:

Decorated function that logs all yielded items

mex.common.settings module

class mex.common.settings.BaseSettings(_env_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | Path | None = None, *, MEX_DEBUG: bool = False, MEX_SINK: list[Sink] = [Sink.NDJSON], MEX_ASSETS_DIR: Path = PosixPath('/home/runner/work/mex-common/mex-common/assets'), MEX_OPS_DIR: Path = PosixPath('/home/runner/work/mex-common/mex-common/ops'), MEX_WORK_DIR: Path = PosixPath('/home/runner/work/mex-common/mex-common'), MEX_IDENTITY_PROVIDER: IdentityProvider = IdentityProvider.MEMORY, MEX_BACKEND_API_URL: HttpUrl = HttpUrl('http://localhost:8080/'), MEX_BACKEND_API_KEY: SecretStr = SecretStr('**********'), MEX_BACKEND_API_PARALLELIZATION: int = 1, MEX_BACKEND_API_CHUNK_SIZE: int = 25, MEX_VERIFY_SESSION: bool | OpsPath = True, MEX_ORGANIGRAM_PATH: AssetsPath = AssetsPath('raw-data/organigram/organizational_units.json'), MEX_PRIMARY_SOURCES_PATH: AssetsPath = AssetsPath('raw-data/primary-sources/primary-sources.json'), MEX_LDAP_URL: SecretStr = SecretStr('**********'), MEX_LDAP_SEARCH_BASE: str = 'dc=ldapmock,dc=local', MEX_WIKI_API_URL: HttpUrl = HttpUrl('http://wikidata/'), MEX_WEB_USER_AGENT: str = 'rki/mex', MEX_ORCID_API_URL: HttpUrl = HttpUrl('https://orcid/'))

Bases: BaseSettings

Common settings definition class.

Settings are accessed through a singleton instance of a pydantic settings class. The singleton instance can be loaded lazily by calling BaseSettings.get().

The base settings should only contain options, that are used by common code. To add more configuration options for a specific subsystem, create a new subclass and define the required fields there. To load a singleton for that subclass, simply call SubsystemSettings.get().

All configuration options should have a speaking name and a clear description. The defaults should be set to a value that works with unit tests and must not contain any secrets or live URLs that would break unit test isolation.

__init__(_env_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | Path | None = None, **values: Any) None

Construct a new settings instance.

assets_dir: Path
backend_api_chunk_size: int
backend_api_key: SecretStr
backend_api_parallelization: int
backend_api_url: HttpUrl
debug: bool
classmethod get() Self

Get the current settings instance from singleton store.

Returns:

An instance of BaseSettings or a subclass thereof

classmethod get_env_name(name: str) str

Get the name of the environment variable for field with given name.

Resolves the actual environment variable name that would be used for a given field, taking into account case sensitivity and environment prefix configuration.

Parameters:

name – The field name to get the environment variable name for.

Returns:

The uppercase environment variable name that maps to the field.

identity_provider: IdentityProvider
ldap_search_base: str
ldap_url: SecretStr
log_settings() Self

Validator that logs the settings in text form.

mex_web_user_agent: str
model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': False, 'env_nested_delimiter': '__', 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'mex_', 'env_prefix_target': 'variable', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'populate_by_name': True, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

no_settings_as_attributes() Self

Validate that no attribute inherits from pydantic.BaseSettings.

ops_dir: Path
orcid_api_url: HttpUrl
organigram_path: AssetsPath
primary_sources_path: AssetsPath
resolve_paths() Self

Resolve AssetPath, OpsPath, and WorkPath.

sink: list[Sink]
text() str

Dump the current settings into a readable table.

Returns:

Formatted table with all settings.

verify_session: bool | OpsPath
wiki_api_url: HttpUrl
work_dir: Path

mex.common.sorters module

mex.common.sorters.topological_sort(items: list[ItemT], primary_key: str, *, parent_key: str | None = None, child_key: str | None = None) None

Sort the given list of items in-place according to their topology.

Items can refer to each other using key fields. A parent item can reference a child item by storing the child’s primary_key in the parent’s child_key field. Similarly, a child can reference its parent using the parent_key field.

This can be useful for submitting items to the backend in the correct order.

mex.common.transform module

class mex.common.transform.MExEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

Custom JSON encoder that can handle pydantic models, enums and UUIDs.

default(obj: object) object

Implement custom serialization rules.

mex.common.transform.camel_to_split(string: str) str

Convert the given string from CamelCase into Split Case.

mex.common.transform.camelcase_to_title(value: str) str

Convert the given string from CamelCase into Title case.

mex.common.transform.clean_dict(obj: Any, unwanted: Sequence[Any] = (None, [])) Any

Clean None and [] from dicts.

mex.common.transform.dromedary_to_kebab(string: str) str

Convert the given string from dromedaryCase into kebab-case.

mex.common.transform.dromedary_to_snake(string: str) str

Convert the given string from dromedaryCase into snake_case.

mex.common.transform.ensure_postfix(string_like: object, postfix: object) str

Return a string with the given postfix appended if it is not present yet.

If string_like already ends with the postfix, return a stringified copy. This method is the inverse of str.removepostfix.

Parameters:
  • string_like – Object to convert to string and potentially postfix.

  • postfix – Object to convert to string and use as postfix.

Returns:

String with the postfix guaranteed to be present at the end.

mex.common.transform.ensure_prefix(string_like: object, prefix: object) str

Return a string with the given prefix prepended if it is not present yet.

If string_like already starts with the prefix, return a stringified copy. This method is the inverse of str.removeprefix.

Parameters:
  • string_like – Object to convert to string and potentially prefix.

  • prefix – Object to convert to string and use as prefix.

Returns:

String with the prefix guaranteed to be present at the beginning.

mex.common.transform.kebab_to_camel(string: str) str

Convert the given string from kebab-case into CamelCase.

mex.common.transform.normalize(string: str) str

Normalize the given string to lowercase, numerals and single spaces.

mex.common.transform.snake_to_dromedary(string: str) str

Convert the given string from snake_case into dromedaryCase.

mex.common.transform.split_to_camel(string: str) str

Convert the given string from Split Case into CamelCase.

mex.common.transform.split_to_caps(string: str) str

Convert the given string from Split case into CAPS_CASE.

mex.common.transform.to_key_and_values(dct: dict[str, Any]) Iterable[tuple[str, list[Any]]]

Return an iterable of dictionary items where the values are always lists.

Normalizes dictionary values by converting single values to single-item lists, leaving existing lists unchanged, and converting None to empty lists.

Parameters:

dct – Dictionary to normalize the values of.

Yields:

Tuples of (key, list_value) where list_value is guaranteed to be a list.

mex.common.utils module

class mex.common.utils.GenericFieldInfo(alias: str | None, annotation: type[Any] | None, frozen: bool)

Bases: object

Abstraction class for unifying FieldInfo and ComputedFieldInfo objects.

alias: str | None
annotation: type[Any] | None
frozen: bool
mex.common.utils.any_contains_any(bases: Iterable[Container[T] | None], tokens: Iterable[T]) bool

Check if any of the given bases contains any of the given tokens.

mex.common.utils.contains_any(base: Container[T], tokens: Iterable[T]) bool

Check if a given base contains any of the given tokens.

mex.common.utils.contains_any_types(field: GenericFieldInfo, *types: type) bool

Return whether a field is annotated as any of the given types.

Unions, lists and type annotations are checked for their inner types and only the non-NoneType types are considered for the type-check.

Parameters:
  • field – A GenericFieldInfo instance

  • types – Types to look for in the field’s annotation

Returns:

Whether the field contains any of the given types

mex.common.utils.contains_only_types(field: GenericFieldInfo, *types: type) bool

Return whether a field is annotated as one of the given types.

Unions, lists and type annotations are checked for their inner types and only the non-NoneType types are considered for the type-check.

Parameters:
  • field – A GenericFieldInfo instance

  • types – Types to look for in the field’s annotation

Returns:

Whether the field contains any of the given types

mex.common.utils.deprecated(old_name: str, new_func: Callable[[P], R]) Callable[[P], R]

Create a deprecated wrapper for a function.

mex.common.utils.ensure_list(values: list[T] | T | None) list[T]

Put objects in lists, replace None with an empty list and return lists as is.

mex.common.utils.get_alias_lookup(model: type[BaseModel]) dict[str, str]

Build a cached mapping from field alias to field names.

Creates a dictionary that maps field aliases (or field names if no alias exists) back to the actual field names. This is useful for resolving field references when working with serialized data that may use aliases.

Parameters:

model – The Pydantic model class to build the alias lookup for.

Returns:

Dictionary mapping field aliases (or names) to actual field names.

mex.common.utils.get_all_fields(model: type[BaseModel]) dict[str, GenericFieldInfo]

Return a combined dict of defined and computed fields of a given model.

This function combines both regular model fields and computed fields into a single dictionary using the GenericFieldInfo abstraction. Results are cached for performance.

Parameters:

model – The Pydantic model class to extract fields from.

Returns:

Dictionary mapping field names to GenericFieldInfo objects for all fields (both regular and computed) in the model.

mex.common.utils.get_field_names_allowing_none(model: type[BaseModel]) list[str]

Build a cached list of fields that can be set to None.

Tests each field’s annotation by attempting to validate None against it. Fields that accept None without raising a ValidationError are considered nullable fields.

Parameters:

model – The Pydantic model class to analyze.

Returns:

List of field names that accept None as a valid value.

mex.common.utils.get_inner_types(annotation: Any, include_none: bool = True, unpack_list: bool = True, unpack_literal: bool = True) Generator[type, None, None]

Recursively yield all inner types from a given type annotation.

Parameters:
  • annotation – The type annotation to process

  • include_none – Whether to include NoneTypes in output

  • unpack_list – Whether to unpack list types

  • unpack_literal – Whether to unpack Literal types

Returns:

All inner types found within the annotation

mex.common.utils.get_list_field_names(model: type[BaseModel]) list[str]

Build a cached list of fields that look like lists.

Analyzes the model’s field annotations to identify fields that are list types. This includes direct list annotations and list types within unions.

Parameters:

model – The Pydantic model class to analyze.

Returns:

List of field names that have list-like type annotations.

mex.common.utils.group_fields_by_class_name(model_classes_by_name: Mapping[str, type[BaseModel]], predicate: Callable[[GenericFieldInfo], bool]) dict[str, list[str]]

Group the field names by model class and filter them by the given predicate.

For each model class, extracts all fields and applies the predicate function to filter them. Returns a mapping from class names to lists of field names that satisfy the predicate condition.

Parameters:
  • model_classes_by_name – Map from class names to model classes.

  • predicate – Function to filter the fields of the classes by.

Returns:

Dictionary mapping class names to a list of field names filtered by predicate.

mex.common.utils.grouper(chunk_size: int, iterable: Iterable[T]) Iterator[Iterable[T | None]]

Collect data into fixed-length chunks or blocks.

Groups items from an iterable into fixed-size chunks. The last chunk may be padded with None values if the total number of items is not evenly divisible by the chunk size.

Parameters:
  • chunk_size – The size of each chunk.

  • iterable – The iterable to group into chunks.

Returns:

Iterator of iterables, each containing chunk_size items (with None padding for the final chunk if necessary).

mex.common.utils.jitter_sleep(min_seconds: float, jitter_seconds: float) None

Sleep a random amount of seconds within the given parameters.

Parameters:
  • min_seconds – The minimum time to sleep

  • jitter_seconds – The variable sleep time added to the minimum

mex.common.utils.random() x in the interval [0, 1).

Module contents