mex.extractors.seq_repo package

Submodules

mex.extractors.seq_repo.extract module

mex.extractors.seq_repo.extract.extract_source_project_coordinator(seq_repo_sources: dict[str, SeqRepoSource]) Generator[LDAPPersonWithQuery, None, None]

Extract LDAP persons with their query string for source project coordinators.

Parameters:

seq_repo_sources – Seq Repo sources

Returns:

Generator for LDAP persons with query

mex.extractors.seq_repo.extract.extract_sources() Generator[SeqRepoSource, None, None]

Extract Seq Repo sources by loading data from source json file.

Returns:

Generator for Seq Repo resources

mex.extractors.seq_repo.filter module

mex.extractors.seq_repo.filter.filter_sources_on_latest_sequencing_date(seq_repo_sources: Iterable[SeqRepoSource]) dict[str, SeqRepoSource]

Filter sources on sequencing date, keeping only latest sequenced item.

Parameters:

seq_repo_sources – Seq Repo unfiltered extracted sources

Returns:

Filtered Seq Repo sources

mex.extractors.seq_repo.main module

mex.extractors.seq_repo.model module

class mex.extractors.seq_repo.model.SeqRepoSource(*, project_coordinators: list[str], customer_org_unit_id: str, sequencing_date: str, lims_sample_id: str, sequencing_platform: str, species: str, project_name: str, customer_sample_name: str, project_id: str)

Bases: BaseModel

Model class for Seq Repo Source.

customer_org_unit_id: str
customer_sample_name: str
lims_sample_id: str
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'customer_org_unit_id': FieldInfo(annotation=str, required=True, alias='customer-org-unit-id', alias_priority=2), 'customer_sample_name': FieldInfo(annotation=str, required=True, alias='customer-sample-name', alias_priority=2), 'lims_sample_id': FieldInfo(annotation=str, required=True, alias='lims-sample-id', alias_priority=2), 'project_coordinators': FieldInfo(annotation=list[str], required=True, alias='project-coordinators', alias_priority=2), 'project_id': FieldInfo(annotation=str, required=True, alias='project-id', alias_priority=2), 'project_name': FieldInfo(annotation=str, required=True, alias='project-name', alias_priority=2), 'sequencing_date': FieldInfo(annotation=str, required=True, alias='sequencing-date', alias_priority=2), 'sequencing_platform': FieldInfo(annotation=str, required=True, alias='sequencing-platform', alias_priority=2), 'species': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

project_coordinators: list[str]
project_id: str
project_name: str
sequencing_date: str
sequencing_platform: str
species: str

mex.extractors.seq_repo.settings module

class mex.extractors.seq_repo.settings.SeqRepoSettings(*, mapping_path: AssetsPath = AssetsPath('mappings/__final__/seq-repo'))

Bases: BaseModel

Settings submodel for the SeqRepo extractor.

mapping_path: AssetsPath
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/seq-repo"), description='Path to the directory with the seq-repo mapping files containing the default values, absolute path or relative to `assets_dir`.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

mex.extractors.seq_repo.transform module

mex.extractors.seq_repo.transform.get_resolved_project_coordinators_and_units(project_coordinators: list[str], seq_repo_source_resolved_project_coordinators: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], project_coordinators_merged_ids_by_query_string: dict[str, list[MergedPersonIdentifier]]) tuple[list[MergedPersonIdentifier], list[MergedOrganizationalUnitIdentifier]]

Get ldap resolved ids of project coordinators and units.

Parameters:
  • project_coordinators – Seq Repo raw project coordinator names

  • seq_repo_source_resolved_project_coordinators – Seq Repo sources resolved project coordinators ldap query results

  • unit_stable_target_ids_by_synonym – Unit stable target ids by synonym

  • project_coordinators_merged_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids

Returns:

Resolved ids project coordinator and units

mex.extractors.seq_repo.transform.transform_seq_repo_access_platform_to_extracted_access_platform(seq_repo_access_platform: Any, unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_primary_source: ExtractedPrimarySource) ExtractedAccessPlatform

Transform seq-repo access platform to ExtractedAccessPlatform.

Parameters:
  • seq_repo_access_platform – Seq Repo access platform mapping model

  • unit_stable_target_ids_by_synonym – Unit stable target ids by synonym

  • extracted_primary_source – Extracted primary source

Returns:

ExtractedAccessPlatform

mex.extractors.seq_repo.transform.transform_seq_repo_activities_to_extracted_activities(seq_repo_sources: dict[str, SeqRepoSource], seq_repo_activity: Any, seq_repo_source_resolved_project_coordinators: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], project_coordinators_merged_ids_by_query_string: dict[str, list[MergedPersonIdentifier]], extracted_primary_source: ExtractedPrimarySource) list[ExtractedActivity]

Transform seq-repo activities to list of unique ExtractedActivity.

Parameters:
  • seq_repo_sources – Seq Repo extracted sources

  • seq_repo_activity – Seq Repo activity mapping models with default values

  • seq_repo_source_resolved_project_coordinators – Seq Repo sources resolved project coordinators ldap query results

  • unit_stable_target_ids_by_synonym – Unit stable target ids by synonym

  • project_coordinators_merged_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids

  • extracted_primary_source – Extracted primary source

Returns:

list of unique ExtractedActivity

mex.extractors.seq_repo.transform.transform_seq_repo_resource_to_extracted_resource(seq_repo_sources: dict[str, SeqRepoSource], seq_repo_activities: dict[str, ExtractedActivity], mex_access_platform: ExtractedAccessPlatform, seq_repo_resource: Any, seq_repo_source_resolved_project_coordinators: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], project_coordinators_merged_ids_by_query_string: dict[str, list[MergedPersonIdentifier]], extracted_organization_rki: ExtractedOrganization, extracted_primary_source: ExtractedPrimarySource) list[ExtractedResource]

Transform seq-repo resources to ExtractedResource.

Parameters:
  • seq_repo_sources – Seq Repo extracted sources

  • seq_repo_activities – Seq Repo extracted activity for default values from mapping

  • mex_access_platform – Extracted access platform

  • seq_repo_resource – Seq Repo resource mapping model with default values

  • seq_repo_source_resolved_project_coordinators – Seq Repo sources resolved project coordinators ldap query results

  • unit_stable_target_ids_by_synonym – Unit stable target ids by synonym

  • project_coordinators_merged_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids

  • extracted_organization_rki – wikidata extracted organization

  • extracted_primary_source – Extracted primary source

Returns:

list of ExtractedResource

Module contents