mex.extractors.seq_repo package¶
Submodules¶
mex.extractors.seq_repo.extract module¶
- mex.extractors.seq_repo.extract.extract_source_project_coordinator(seq_repo_sources: dict[str, SeqRepoSource]) Generator[LDAPPersonWithQuery, None, None] ¶
Extract LDAP persons with their query string for source project coordinators.
- Parameters:
seq_repo_sources – Seq Repo sources
- Returns:
Generator for LDAP persons with query
- mex.extractors.seq_repo.extract.extract_sources() Generator[SeqRepoSource, None, None] ¶
Extract Seq Repo sources by loading data from source json file.
- Returns:
Generator for Seq Repo resources
mex.extractors.seq_repo.filter module¶
- mex.extractors.seq_repo.filter.filter_sources_on_latest_sequencing_date(seq_repo_sources: Iterable[SeqRepoSource]) dict[str, SeqRepoSource] ¶
Filter sources on sequencing date, keeping only latest sequenced item.
- Parameters:
seq_repo_sources – Seq Repo unfiltered extracted sources
- Returns:
Filtered Seq Repo sources
mex.extractors.seq_repo.main module¶
mex.extractors.seq_repo.model module¶
- class mex.extractors.seq_repo.model.SeqRepoSource(*, project_coordinators: list[str], customer_org_unit_id: str, sequencing_date: str, lims_sample_id: str, sequencing_platform: str, species: str, project_name: str, customer_sample_name: str, project_id: str)¶
Bases:
BaseModel
Model class for Seq Repo Source.
- customer_org_unit_id: str¶
- customer_sample_name: str¶
- lims_sample_id: str¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'customer_org_unit_id': FieldInfo(annotation=str, required=True, alias='customer-org-unit-id', alias_priority=2), 'customer_sample_name': FieldInfo(annotation=str, required=True, alias='customer-sample-name', alias_priority=2), 'lims_sample_id': FieldInfo(annotation=str, required=True, alias='lims-sample-id', alias_priority=2), 'project_coordinators': FieldInfo(annotation=list[str], required=True, alias='project-coordinators', alias_priority=2), 'project_id': FieldInfo(annotation=str, required=True, alias='project-id', alias_priority=2), 'project_name': FieldInfo(annotation=str, required=True, alias='project-name', alias_priority=2), 'sequencing_date': FieldInfo(annotation=str, required=True, alias='sequencing-date', alias_priority=2), 'sequencing_platform': FieldInfo(annotation=str, required=True, alias='sequencing-platform', alias_priority=2), 'species': FieldInfo(annotation=str, required=True)}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- project_coordinators: list[str]¶
- project_id: str¶
- project_name: str¶
- sequencing_date: str¶
- sequencing_platform: str¶
- species: str¶
mex.extractors.seq_repo.settings module¶
- class mex.extractors.seq_repo.settings.SeqRepoSettings(*, mapping_path: AssetsPath = AssetsPath('mappings/__final__/seq-repo'))¶
Bases:
BaseModel
Settings submodel for the SeqRepo extractor.
- mapping_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/seq-repo"), description='Path to the directory with the seq-repo mapping files containing the default values, absolute path or relative to `assets_dir`.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
mex.extractors.seq_repo.transform module¶
- mex.extractors.seq_repo.transform.get_resolved_project_coordinators_and_units(project_coordinators: list[str], seq_repo_source_resolved_project_coordinators: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], project_coordinators_merged_ids_by_query_string: dict[str, list[MergedPersonIdentifier]]) tuple[list[MergedPersonIdentifier], list[MergedOrganizationalUnitIdentifier]] ¶
Get ldap resolved ids of project coordinators and units.
- Parameters:
project_coordinators – Seq Repo raw project coordinator names
seq_repo_source_resolved_project_coordinators – Seq Repo sources resolved project coordinators ldap query results
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
project_coordinators_merged_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids
- Returns:
Resolved ids project coordinator and units
- mex.extractors.seq_repo.transform.transform_seq_repo_access_platform_to_extracted_access_platform(seq_repo_access_platform: Any, unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_primary_source: ExtractedPrimarySource) ExtractedAccessPlatform ¶
Transform seq-repo access platform to ExtractedAccessPlatform.
- Parameters:
seq_repo_access_platform – Seq Repo access platform mapping model
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
extracted_primary_source – Extracted primary source
- Returns:
ExtractedAccessPlatform
- mex.extractors.seq_repo.transform.transform_seq_repo_activities_to_extracted_activities(seq_repo_sources: dict[str, SeqRepoSource], seq_repo_activity: Any, seq_repo_source_resolved_project_coordinators: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], project_coordinators_merged_ids_by_query_string: dict[str, list[MergedPersonIdentifier]], extracted_primary_source: ExtractedPrimarySource) list[ExtractedActivity] ¶
Transform seq-repo activities to list of unique ExtractedActivity.
- Parameters:
seq_repo_sources – Seq Repo extracted sources
seq_repo_activity – Seq Repo activity mapping models with default values
seq_repo_source_resolved_project_coordinators – Seq Repo sources resolved project coordinators ldap query results
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
project_coordinators_merged_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids
extracted_primary_source – Extracted primary source
- Returns:
list of unique ExtractedActivity
- mex.extractors.seq_repo.transform.transform_seq_repo_resource_to_extracted_resource(seq_repo_sources: dict[str, SeqRepoSource], seq_repo_activities: dict[str, ExtractedActivity], mex_access_platform: ExtractedAccessPlatform, seq_repo_resource: Any, seq_repo_source_resolved_project_coordinators: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], project_coordinators_merged_ids_by_query_string: dict[str, list[MergedPersonIdentifier]], extracted_organization_rki: ExtractedOrganization, extracted_primary_source: ExtractedPrimarySource) list[ExtractedResource] ¶
Transform seq-repo resources to ExtractedResource.
- Parameters:
seq_repo_sources – Seq Repo extracted sources
seq_repo_activities – Seq Repo extracted activity for default values from mapping
mex_access_platform – Extracted access platform
seq_repo_resource – Seq Repo resource mapping model with default values
seq_repo_source_resolved_project_coordinators – Seq Repo sources resolved project coordinators ldap query results
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
project_coordinators_merged_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids
extracted_organization_rki – wikidata extracted organization
extracted_primary_source – Extracted primary source
- Returns:
list of ExtractedResource