mex.extractors.biospecimen package¶
Subpackages¶
- mex.extractors.biospecimen.models package
- Submodules
- mex.extractors.biospecimen.models.source module
BiospecimenResource
BiospecimenResource.alternativer_titel
BiospecimenResource.anonymisiert_pseudonymisiert
BiospecimenResource.beschreibung
BiospecimenResource.externe_partner
BiospecimenResource.file_name
BiospecimenResource.id_loinc
BiospecimenResource.id_mesh_begriff
BiospecimenResource.kontakt
BiospecimenResource.methoden
BiospecimenResource.methodenbeschreibung
BiospecimenResource.mitwirkende_fachabteilung
BiospecimenResource.mitwirkende_personen
BiospecimenResource.model_computed_fields
BiospecimenResource.model_config
BiospecimenResource.model_fields
BiospecimenResource.offizieller_titel_der_probensammlung
BiospecimenResource.raeumlicher_bezug
BiospecimenResource.rechte
BiospecimenResource.ressourcentyp_allgemein
BiospecimenResource.ressourcentyp_speziell
BiospecimenResource.schlagworte
BiospecimenResource.sheet_name
BiospecimenResource.studienbezug
BiospecimenResource.thema
BiospecimenResource.tools_instrumente_oder_apparate
BiospecimenResource.verantwortliche_fachabteilung
BiospecimenResource.verwandte_publikation_doi
BiospecimenResource.verwandte_publikation_titel
BiospecimenResource.vorhandene_anzahl_der_proben
BiospecimenResource.weiterfuehrende_dokumentation_titel
BiospecimenResource.weiterfuehrende_dokumentation_url_oder_dateipfad
BiospecimenResource.zeitlicher_bezug
BiospecimenResource.zugriffsbeschraenkung
- Module contents
Submodules¶
mex.extractors.biospecimen.extract module¶
- mex.extractors.biospecimen.extract.extract_biospecimen_contacts_by_email(biospecimen_resource: Iterable[BiospecimenResource]) Generator[LDAPPerson, None, None] ¶
Extract LDAP persons for Biospecimen contacts.
- Parameters:
biospecimen_resource – Biospecimen resources
- Returns:
Generator for LDAP persons
- mex.extractors.biospecimen.extract.extract_biospecimen_organizations(biospecimen_resources: list[BiospecimenResource]) dict[str, MergedOrganizationIdentifier] ¶
Search and extract organization from wikidata.
- Parameters:
biospecimen_resources – Iterable of biospecimen resources
- Returns:
dict with WikidataOrganization ID by externe partner
- mex.extractors.biospecimen.extract.extract_biospecimen_resource(resource: DataFrame, sheet_name: str, file_name: str) BiospecimenResource | None ¶
Extract one Biospecimen resource from an xlsx file.
- Parameters:
resource – DataFrame containing resource information
sheet_name – Name of the Excel sheet the data came from
file_name – Name of the Excel file
- Settings:
key_col: column in the file with keys val_col: column in the file with values
- Returns:
Biospecimen resource
- mex.extractors.biospecimen.extract.extract_biospecimen_resources() Generator[BiospecimenResource, None, None] ¶
Extract Biospecimen resources by loading data from MS-Excel file.
- Settings:
- dir_path: Path to the biospecimen directory,
absolute or relative to assets_dir
- Returns:
Generator for Biospecimen resources
- mex.extractors.biospecimen.extract.get_clean_file_name(file_name: str) str ¶
Clean file name string.
- Parameters:
file_name – file_name string
- Returns:
cleaned file name string
- mex.extractors.biospecimen.extract.get_clean_string(series: Series[Any]) str ¶
Clean string DataFrame and concatenate to one string.
- Parameters:
series – series of related field
- Returns:
string of extracted field
- mex.extractors.biospecimen.extract.get_values(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None ¶
Extract values of resource corresponding to Feldname.
- Parameters:
resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field
- Returns:
string of extracted field
- mex.extractors.biospecimen.extract.get_year_from_zeitlicher_bezug(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None ¶
Extract the first four connected digits of the string as year.
- Parameters:
resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field
- Returns:
string with first four digits treated as zeitlicher_bezug year
mex.extractors.biospecimen.main module¶
mex.extractors.biospecimen.settings module¶
- class mex.extractors.biospecimen.settings.BiospecimenSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/biospecimen'), key_col: str = 'Feldname', val_col: str = 'zu extrahierender Wert (maschinenlesbar)', mapping_path: AssetsPath = AssetsPath('mappings/__final__/biospecimen'))¶
Bases:
BaseModel
Settings submodel for the Biospecimen extractor.
- key_col: str¶
- mapping_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'key_col': FieldInfo(annotation=str, required=False, default='Feldname', description='column name of the biospecimen metadata keys'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/biospecimen"), description='Path to the directory with the biospecimen mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'raw_data_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/biospecimen"), description='Path to the directory with the biospecimen excel files, absolute path or relative to `assets_dir`.'), 'val_col': FieldInfo(annotation=str, required=False, default='zu extrahierender Wert (maschinenlesbar)', description='column name of the biospecimen metadata values')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- raw_data_path: AssetsPath¶
- val_col: str¶
mex.extractors.biospecimen.transform module¶
- mex.extractors.biospecimen.transform.get_or_create_externe_partner(externe_partner: str, extracted_organizations: dict[str, MergedOrganizationIdentifier], extracted_primary_source_biospecimen: ExtractedPrimarySource) MergedOrganizationIdentifier ¶
Get extracted organization for label or create new organization.
- Parameters:
externe_partner – externe partner label
extracted_organizations – merged organization identifier extracted from wikidata
extracted_primary_source_biospecimen – extracted primary source
- Returns:
matched or created merged organization identifier
- mex.extractors.biospecimen.transform.transform_biospecimen_resource_to_mex_resource(biospecimen_resources: Iterable[BiospecimenResource], extracted_primary_source_biospecimen: ExtractedPrimarySource, unit_stable_target_ids_by_synonym: dict[str, Identifier], mex_persons: Iterable[ExtractedPerson], extracted_organization_rki: ExtractedOrganization, extracted_synopse_activities: Iterable[ExtractedActivity], resource_mapping: Any, extracted_organizations: dict[str, MergedOrganizationIdentifier]) Generator[ExtractedResource, None, None] ¶
Transform Biospecimen resources to extracted resources.
- Parameters:
biospecimen_resources – Biospecimen resources
extracted_primary_source_biospecimen – Extracted platform for Biospecimen
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
mex_persons – Generator for ExtractedPerson
extracted_synopse_activities – extracted synopse activities
extracted_organization_rki – extracted organization
resource_mapping – resource mapping model with default values
extracted_organizations – extracted organizations by label
- Returns:
Generator for ExtractedResource instances