mex.common.primary_source package

Submodules

mex.common.primary_source.extract module

mex.common.primary_source.extract.extract_seed_primary_sources() list[SeedPrimarySource]

Extract seed primary sources from the raw-data JSON file.

Settings:

primary_sources_path: Resolved path to the primary sources file

Returns:

List of seed primary sources

mex.common.primary_source.helpers module

mex.common.primary_source.helpers.get_extracted_primary_source_by_name(name: str) ExtractedPrimarySource | None

Pick the seed primary source with the given name, transform and return it.

Parameters:

name – Name (identifierInPrimarySource) of the primary source

Returns:

Extracted primary source if it was found, else None

mex.common.primary_source.models module

class mex.common.primary_source.models.SeedPrimarySource(*, identifier: str, title: list[Text] = [])

Bases: BaseModel

Model class for primary sources coming from the raw-data JSON file.

identifier: str
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: list[Text]

mex.common.primary_source.transform module

mex.common.primary_source.transform.transform_seed_primary_source_to_extracted_primary_source(primary_source: SeedPrimarySource) ExtractedPrimarySource

Transform a seed primary source into an ExtractedPrimarySource.

Parameters:

primary_source – Primary source coming from raw-data file

Returns:

ExtractedPrimarySource

mex.common.primary_source.transform.transform_seed_primary_sources_to_extracted_primary_sources(seed_primary_sources: Iterable[SeedPrimarySource]) list[ExtractedPrimarySource]

Transform seed primary sources into ExtractedPrimarySources.

Parameters:

seed_primary_sources – Iterable of primary sources coming from raw-data file

Returns:

List of ExtractedPrimarySource

Module contents

Helper extractor to get metadata primary sources.

It represents the original source of metadata that all the data in MEx will attach to. For example confluence-vvt primary source means: data extracted from confluence-vvt x-system will be attached to this primary source.

Common use cases

  • extract info of a particular primary source to attach extracted metadata to

Configuration

To configure primary_source extractor, set primary_sources_path in settings to point to primary-sources.json in mex-assets repository. A sample primary sources file is also included in mex-extractors at assets/raw-data/primary-sources/primary-sources.json for testing purposes.

Extracting primary sources

Use extract_seed_primary_sources in primary_source.extract function to extract all primary sources. This function will yield all the primary sources available in primary_sources.json source file.

Transforming primary sources

Use transform_seed_primary_sources_to_extracted_primary_sources in primary_sources.transform to get ExtractedPrimarySource. This function will yield all the primary sources, which is often not required.

So to filter out only the required x-system primary sources use get_primary_sources_by_name in primary_sources.transform. This function needs Iterable from step-1 and names of the required x-systems. For example by passing names as mex, ldap, confluence-vvt will return ExtractedPrimarySource of these x-systems.