mex.common.primary_source package¶
Submodules¶
mex.common.primary_source.extract module¶
- mex.common.primary_source.extract.extract_seed_primary_sources() list[SeedPrimarySource]¶
Extract seed primary sources from the raw-data JSON file.
- Settings:
primary_sources_path: Resolved path to the primary sources file
- Returns:
List of seed primary sources
mex.common.primary_source.helpers module¶
- mex.common.primary_source.helpers.get_extracted_primary_source_by_name(name: str) ExtractedPrimarySource | None¶
Pick the seed primary source with the given name, transform and return it.
- Parameters:
name – Name (identifierInPrimarySource) of the primary source
- Returns:
Extracted primary source if it was found, else None
mex.common.primary_source.models module¶
- class mex.common.primary_source.models.SeedPrimarySource(*, identifier: str, title: list[Text] = [])¶
Bases:
BaseModelModel class for primary sources coming from the raw-data JSON file.
- identifier: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
mex.common.primary_source.transform module¶
- mex.common.primary_source.transform.transform_seed_primary_source_to_extracted_primary_source(primary_source: SeedPrimarySource) ExtractedPrimarySource¶
Transform a seed primary source into an ExtractedPrimarySource.
- Parameters:
primary_source – Primary source coming from raw-data file
- Returns:
ExtractedPrimarySource
- mex.common.primary_source.transform.transform_seed_primary_sources_to_extracted_primary_sources(seed_primary_sources: Iterable[SeedPrimarySource]) list[ExtractedPrimarySource]¶
Transform seed primary sources into ExtractedPrimarySources.
- Parameters:
seed_primary_sources – Iterable of primary sources coming from raw-data file
- Returns:
List of ExtractedPrimarySource
Module contents¶
Helper extractor to get metadata primary sources.
It represents the original source of metadata that all the data in MEx will attach to. For example confluence-vvt primary source means: data extracted from confluence-vvt x-system will be attached to this primary source.
Common use cases¶
extract info of a particular primary source to attach extracted metadata to
Configuration¶
To configure primary_source extractor, set primary_sources_path in settings to point to primary-sources.json in mex-assets repository. A sample primary sources file is also included in mex-extractors at assets/raw-data/primary-sources/primary-sources.json for testing purposes.
Extracting primary sources¶
Use extract_seed_primary_sources in primary_source.extract function to extract all primary sources. This function will yield all the primary sources available in primary_sources.json source file.
Transforming primary sources¶
Use transform_seed_primary_sources_to_extracted_primary_sources in primary_sources.transform to get ExtractedPrimarySource. This function will yield all the primary sources, which is often not required.
So to filter out only the required x-system primary sources use get_primary_sources_by_name in primary_sources.transform. This function needs Iterable from step-1 and names of the required x-systems. For example by passing names as mex, ldap, confluence-vvt will return ExtractedPrimarySource of these x-systems.