mex.extractors.biospecimen package¶

Subpackages¶

mex.extractors.biospecimen.models package

Submodules¶

mex.extractors.biospecimen.extract module¶

mex.extractors.biospecimen.extract.extract_biospecimen_contacts_by_email(biospecimen_resource: Iterable[BiospecimenResource]) → list[LDAPPerson]¶

Extract LDAP persons for Biospecimen contacts.

Parameters:: biospecimen_resource – Biospecimen resources
Returns:: List of LDAP persons

mex.extractors.biospecimen.extract.extract_biospecimen_organizations(biospecimen_resources: Iterable[BiospecimenResource]) → dict[str, MergedOrganizationIdentifier]¶

Search and extract organization from wikidata.

Parameters:: biospecimen_resources – Iterable of biospecimen resources
Returns:: dict with WikidataOrganization ID by externe partner

mex.extractors.biospecimen.extract.extract_biospecimen_resource(resource: DataFrame, sheet_name: str, file_name: str) → BiospecimenResource | None¶

Extract one Biospecimen resource from an xlsx file.

Parameters:

resource – DataFrame containing resource information
sheet_name – Name of the Excel sheet the data came from
file_name – Name of the Excel file

Settings:: key_col: column in the file with keys val_col: column in the file with values

Returns:: Biospecimen resource

mex.extractors.biospecimen.extract.extract_biospecimen_resources() → list[BiospecimenResource]¶

Extract Biospecimen resources by loading data from MS-Excel file.

Settings:

dir_path: Path to the biospecimen directory,: absolute or relative to assets_dir

Returns:: List of Biospecimen resources

mex.extractors.biospecimen.extract.get_clean_file_name(file_name: str) → str¶

Clean file name string.

Parameters:: file_name – file_name string
Returns:: cleaned file name string

mex.extractors.biospecimen.extract.get_clean_string(series: Series[Any]) → str¶

Clean string DataFrame and concatenate to one string.

Parameters:: series – series of related field
Returns:: string of extracted field

mex.extractors.biospecimen.extract.get_values(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) → str | None¶

Extract values of resource corresponding to Feldname.

Parameters:

resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field

Returns:

string of extracted field

mex.extractors.biospecimen.extract.get_year_from_zeitlicher_bezug(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) → str | None¶

Extract the first four connected digits of the string as year.

Parameters:

resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field

Returns:

string with first four digits treated as zeitlicher_bezug year

mex.extractors.biospecimen.main module¶

mex.extractors.biospecimen.settings module¶

class mex.extractors.biospecimen.settings.BiospecimenSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/biospecimen'), key_col: str = 'Feldname', val_col: str = 'zu extrahierender Wert (maschinenlesbar)', mapping_path: AssetsPath = AssetsPath('mappings/biospecimen'))¶

Bases: BaseModel

Settings submodel for the Biospecimen extractor.

key_col: str¶

mapping_path: AssetsPath¶

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_data_path: AssetsPath¶

val_col: str¶

mex.extractors.biospecimen.transform module¶

mex.extractors.biospecimen.transform.get_or_create_externe_partner(externe_partner: str, extracted_organizations: dict[str, MergedOrganizationIdentifier]) → MergedOrganizationIdentifier¶

Get extracted organization for label or create new organization.

Parameters:

externe_partner – externe partner label
extracted_organizations – merged organization identifier extracted from wikidata

Returns:

matched or created merged organization identifier

mex.extractors.biospecimen.transform.transform_biospecimen_resource_to_mex_resource(biospecimen_resources: Iterable[BiospecimenResource], unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]], mex_persons: Iterable[ExtractedPerson], extracted_organization_rki: ExtractedOrganization, synopse_extracted_activities: Iterable[ExtractedActivity], resource_mapping: ResourceMapping, extracted_organizations: dict[str, MergedOrganizationIdentifier]) → list[ExtractedResource]¶

Transform Biospecimen resources to extracted resources.

Parameters:

biospecimen_resources – Biospecimen resources
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
mex_persons – Iterable of ExtractedPersons
synopse_extracted_activities – extracted synopse activities
extracted_organization_rki – extracted organization
resource_mapping – resource mapping model with default values
extracted_organizations – extracted organizations by label

Returns:

List of ExtractedResource instances

mex.extractors.biospecimen package¶

Subpackages¶

Submodules¶

mex.extractors.biospecimen.extract module¶

mex.extractors.biospecimen.main module¶

mex.extractors.biospecimen.settings module¶

mex.extractors.biospecimen.transform module¶

Module contents¶

mex-extractors

Navigation

Related Topics