mex.extractors.biospecimen package¶
Subpackages¶
- mex.extractors.biospecimen.models package
- Submodules
- mex.extractors.biospecimen.models.source module
BiospecimenResourceBiospecimenResource.alternativer_titelBiospecimenResource.anonymisiert_pseudonymisiertBiospecimenResource.beschreibungBiospecimenResource.externe_partnerBiospecimenResource.file_nameBiospecimenResource.id_loincBiospecimenResource.id_mesh_begriffBiospecimenResource.kontaktBiospecimenResource.methodenBiospecimenResource.methodenbeschreibungBiospecimenResource.mitwirkende_fachabteilungBiospecimenResource.mitwirkende_personenBiospecimenResource.model_configBiospecimenResource.offizieller_titel_der_probensammlungBiospecimenResource.raeumlicher_bezugBiospecimenResource.rechteBiospecimenResource.ressourcentyp_allgemeinBiospecimenResource.ressourcentyp_speziellBiospecimenResource.schlagworteBiospecimenResource.sheet_nameBiospecimenResource.studienbezugBiospecimenResource.themaBiospecimenResource.tools_instrumente_oder_apparateBiospecimenResource.verantwortliche_fachabteilungBiospecimenResource.verwandte_publikation_doiBiospecimenResource.verwandte_publikation_titelBiospecimenResource.vorhandene_anzahl_der_probenBiospecimenResource.weiterfuehrende_dokumentation_titelBiospecimenResource.weiterfuehrende_dokumentation_url_oder_dateipfadBiospecimenResource.zeitlicher_bezugBiospecimenResource.zugriffsbeschraenkung
- Module contents
Submodules¶
mex.extractors.biospecimen.extract module¶
- mex.extractors.biospecimen.extract.extract_biospecimen_contacts_by_email(biospecimen_resource: Iterable[BiospecimenResource]) Generator[LDAPPerson, None, None]¶
Extract LDAP persons for Biospecimen contacts.
- Parameters:
biospecimen_resource – Biospecimen resources
- Returns:
Generator for LDAP persons
- mex.extractors.biospecimen.extract.extract_biospecimen_organizations(biospecimen_resources: list[BiospecimenResource]) dict[str, MergedOrganizationIdentifier]¶
Search and extract organization from wikidata.
- Parameters:
biospecimen_resources – Iterable of biospecimen resources
- Returns:
dict with WikidataOrganization ID by externe partner
- mex.extractors.biospecimen.extract.extract_biospecimen_resource(resource: DataFrame, sheet_name: str, file_name: str) BiospecimenResource | None¶
Extract one Biospecimen resource from an xlsx file.
- Parameters:
resource – DataFrame containing resource information
sheet_name – Name of the Excel sheet the data came from
file_name – Name of the Excel file
- Settings:
key_col: column in the file with keys val_col: column in the file with values
- Returns:
Biospecimen resource
- mex.extractors.biospecimen.extract.extract_biospecimen_resources() Generator[BiospecimenResource, None, None]¶
Extract Biospecimen resources by loading data from MS-Excel file.
- Settings:
- dir_path: Path to the biospecimen directory,
absolute or relative to assets_dir
- Returns:
Generator for Biospecimen resources
- mex.extractors.biospecimen.extract.get_clean_file_name(file_name: str) str¶
Clean file name string.
- Parameters:
file_name – file_name string
- Returns:
cleaned file name string
- mex.extractors.biospecimen.extract.get_clean_string(series: Series[Any]) str¶
Clean string DataFrame and concatenate to one string.
- Parameters:
series – series of related field
- Returns:
string of extracted field
- mex.extractors.biospecimen.extract.get_values(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None¶
Extract values of resource corresponding to Feldname.
- Parameters:
resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field
- Returns:
string of extracted field
- mex.extractors.biospecimen.extract.get_year_from_zeitlicher_bezug(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None¶
Extract the first four connected digits of the string as year.
- Parameters:
resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field
- Returns:
string with first four digits treated as zeitlicher_bezug year
mex.extractors.biospecimen.main module¶
mex.extractors.biospecimen.settings module¶
- class mex.extractors.biospecimen.settings.BiospecimenSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/biospecimen'), key_col: str = 'Feldname', val_col: str = 'zu extrahierender Wert (maschinenlesbar)', mapping_path: AssetsPath = AssetsPath('mappings/biospecimen'))¶
Bases:
BaseModelSettings submodel for the Biospecimen extractor.
- key_col: str¶
- mapping_path: AssetsPath¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- raw_data_path: AssetsPath¶
- val_col: str¶
mex.extractors.biospecimen.transform module¶
- mex.extractors.biospecimen.transform.get_or_create_externe_partner(externe_partner: str, extracted_organizations: dict[str, MergedOrganizationIdentifier]) MergedOrganizationIdentifier¶
Get extracted organization for label or create new organization.
- Parameters:
externe_partner – externe partner label
extracted_organizations – merged organization identifier extracted from wikidata
- Returns:
matched or created merged organization identifier
- mex.extractors.biospecimen.transform.transform_biospecimen_resource_to_mex_resource(biospecimen_resources: Iterable[BiospecimenResource], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], mex_persons: Iterable[ExtractedPerson], extracted_organization_rki: ExtractedOrganization, synopse_extracted_activities: Iterable[ExtractedActivity], resource_mapping: ResourceMapping, extracted_organizations: dict[str, MergedOrganizationIdentifier]) Generator[ExtractedResource, None, None]¶
Transform Biospecimen resources to extracted resources.
- Parameters:
biospecimen_resources – Biospecimen resources
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
mex_persons – Iterable of ExtractedPersons
synopse_extracted_activities – extracted synopse activities
extracted_organization_rki – extracted organization
resource_mapping – resource mapping model with default values
extracted_organizations – extracted organizations by label
- Returns:
Generator for ExtractedResource instances