mex.extractors.seq_repo package¶
Submodules¶
mex.extractors.seq_repo.extract module¶
- mex.extractors.seq_repo.extract.extract_source_project_coordinator(seq_repo_sources: dict[str, SeqRepoSource]) list[LDAPPersonWithQuery]¶
Extract LDAP persons with their query string for source project coordinators.
- Parameters:
seq_repo_sources – Seq Repo sources
- Returns:
List of LDAP persons with query
- mex.extractors.seq_repo.extract.extract_sources() list[SeqRepoSource]¶
Extract Seq Repo sources by loading data from source json file.
- Returns:
List of Seq Repo resources
mex.extractors.seq_repo.filter module¶
- mex.extractors.seq_repo.filter.filter_sources_on_latest_sequencing_date(seq_repo_sources: Iterable[SeqRepoSource]) dict[str, SeqRepoSource]¶
Filter sources on sequencing date, keeping only latest sequenced item.
- Parameters:
seq_repo_sources – Seq Repo unfiltered extracted sources
- Returns:
Filtered Seq Repo sources
mex.extractors.seq_repo.main module¶
mex.extractors.seq_repo.model module¶
- class mex.extractors.seq_repo.model.SeqRepoSource(*, project_coordinators: list[str], customer_org_unit_id: str, sequencing_date: str, lims_sample_id: str, sequencing_platform: str, species: str, project_name: str, customer_sample_name: str, project_id: str)¶
Bases:
BaseRawDataModel class for Seq Repo Source.
- customer_org_unit_id: str¶
- customer_sample_name: str¶
- get_end_year() TemporalEntity | None¶
Return end year from extractor.
- get_identifier_in_primary_source() str | None¶
Return identifier in primary source from extractor.
- get_partners() Sequence[str | None]¶
Return partners from extractor.
- get_start_year() TemporalEntity | None¶
Return start year from extractor.
- get_units() Sequence[str | None]¶
Return units from extractor.
- lims_sample_id: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- project_coordinators: list[str]¶
- project_id: str¶
- project_name: str¶
- sequencing_date: str¶
- sequencing_platform: str¶
- species: str¶
mex.extractors.seq_repo.settings module¶
- class mex.extractors.seq_repo.settings.SeqRepoSettings(*, mapping_path: AssetsPath = AssetsPath('mappings/seq-repo'))¶
Bases:
BaseModelSettings submodel for the SeqRepo extractor.
- mapping_path: AssetsPath¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
mex.extractors.seq_repo.transform module¶
- mex.extractors.seq_repo.transform.get_resolved_project_coordinators_and_units(project_coordinators: list[str], seq_repo_ldap_persons_with_query: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]], seq_repo_merged_person_ids_by_query_string: dict[str, list[MergedPersonIdentifier]]) tuple[list[MergedPersonIdentifier], list[MergedOrganizationalUnitIdentifier]]¶
Get ldap resolved ids of project coordinators and units.
- Parameters:
project_coordinators – Seq Repo raw project coordinator names
seq_repo_ldap_persons_with_query – Seq Repo sources resolved project coordinators ldap query results
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
seq_repo_merged_person_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids
- Returns:
Resolved ids project coordinator and units
- mex.extractors.seq_repo.transform.transform_seq_repo_access_platform_to_extracted_access_platform(seq_repo_access_platform: AccessPlatformMapping, unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]]) ExtractedAccessPlatform¶
Transform seq-repo access platform to ExtractedAccessPlatform.
- Parameters:
seq_repo_access_platform – Seq Repo access platform mapping model
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
- Returns:
ExtractedAccessPlatform
- mex.extractors.seq_repo.transform.transform_seq_repo_activities_to_extracted_activities(seq_repo_sources: dict[str, SeqRepoSource], seq_repo_activity: ActivityMapping, seq_repo_ldap_persons_with_query: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]], seq_repo_merged_person_ids_by_query_string: dict[str, list[MergedPersonIdentifier]]) list[ExtractedActivity]¶
Transform seq-repo activities to list of unique ExtractedActivity.
- Parameters:
seq_repo_sources – Seq Repo extracted sources
seq_repo_activity – Seq Repo activity mapping models with default values
seq_repo_ldap_persons_with_query – Seq Repo sources resolved project coordinators ldap query results
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
seq_repo_merged_person_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids
- Returns:
list of unique ExtractedActivity
- mex.extractors.seq_repo.transform.transform_seq_repo_resource_to_extracted_resource(seq_repo_sources: dict[str, SeqRepoSource], seq_repo_activities: dict[str, ExtractedActivity], mex_access_platform: ExtractedAccessPlatform, seq_repo_resource: ResourceMapping, seq_repo_ldap_persons_with_query: list[LDAPPersonWithQuery], unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]], seq_repo_merged_person_ids_by_query_string: dict[str, list[MergedPersonIdentifier]], extracted_organization_rki: ExtractedOrganization) list[ExtractedResource]¶
Transform seq-repo resources to ExtractedResource.
- Parameters:
seq_repo_sources – Seq Repo extracted sources
seq_repo_activities – Seq Repo extracted activity for default values from mapping
mex_access_platform – Extracted access platform
seq_repo_resource – Seq Repo resource mapping model with default values
seq_repo_ldap_persons_with_query – Seq Repo sources resolved project coordinators ldap query results
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
seq_repo_merged_person_ids_by_query_string – Seq Repo Sources resolved project coordinators merged ids
extracted_organization_rki – wikidata extracted organization
- Returns:
list of ExtractedResource