mex.biospecimen package¶
Subpackages¶
- mex.biospecimen.models package
- Submodules
- mex.biospecimen.models.source module
BiospecimenResource
BiospecimenResource.alternativer_titel
BiospecimenResource.anonymisiert_pseudonymisiert
BiospecimenResource.beschreibung
BiospecimenResource.externe_partner
BiospecimenResource.id_loinc
BiospecimenResource.id_mesh_begriff
BiospecimenResource.kontakt
BiospecimenResource.methoden
BiospecimenResource.methodenbeschreibung
BiospecimenResource.mitwirkende_fachabteilung
BiospecimenResource.mitwirkende_personen
BiospecimenResource.model_computed_fields
BiospecimenResource.model_config
BiospecimenResource.model_fields
BiospecimenResource.offizieller_titel_der_probensammlung
BiospecimenResource.raeumlicher_bezug
BiospecimenResource.rechte
BiospecimenResource.ressourcentyp_allgemein
BiospecimenResource.ressourcentyp_speziell
BiospecimenResource.schlagworte
BiospecimenResource.sheet_name
BiospecimenResource.studienbezug
BiospecimenResource.thema
BiospecimenResource.tools_instrumente_oder_apparate
BiospecimenResource.verantwortliche_fachabteilung
BiospecimenResource.verwandte_publikation_doi
BiospecimenResource.verwandte_publikation_titel
BiospecimenResource.vorhandene_anzahl_der_proben
BiospecimenResource.weiterfuehrende_dokumentation_titel
BiospecimenResource.weiterfuehrende_dokumentation_url_oder_dateipfad
BiospecimenResource.zeitlicher_bezug
BiospecimenResource.zugriffsbeschraenkung
- Module contents
Submodules¶
mex.biospecimen.extract module¶
- mex.biospecimen.extract.extract_biospecimen_contacts_by_email(biospecimen_resource: Iterable[BiospecimenResource]) Generator[LDAPPerson, None, None] ¶
Extract LDAP persons for Biospecimen contacts.
- Parameters:
biospecimen_resource – Biospecimen resources
- Returns:
Generator for LDAP persons
- mex.biospecimen.extract.extract_biospecimen_resource(resource: DataFrame, sheet_name: str) BiospecimenResource | None ¶
Extract one Biospecimen resource from an xlsx file.
- Parameters:
resource – DataFrame containing resource information
sheet_name – Name of the Excel sheet the data came from
- Settings:
key_col: column in the file with keys val_col: column in the file with values
- Returns:
Biospecimen resource
- mex.biospecimen.extract.extract_biospecimen_resources() Generator[BiospecimenResource, None, None] ¶
Extract Biospecimen resources by loading data from MS-Excel file.
- Settings:
- dir_path: Path to the biospecimen directory,
absolute or relative to assets_dir
- Returns:
Generator for Biospecimen resources
- mex.biospecimen.extract.get_clean_file_name(file_name: str) str ¶
Clean file name string.
- Parameters:
file_name – file_name string
- Returns:
cleaned file name string
- mex.biospecimen.extract.get_clean_string(series: Series[Any]) str ¶
Clean string DataFrame and concatenate to one string.
- Parameters:
series – series of related field
- Returns:
string of extracted field
- mex.biospecimen.extract.get_values(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None ¶
Extract values of resource corresponding to Feldname.
- Parameters:
resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field
- Returns:
string of extracted field
- mex.biospecimen.extract.get_year_from_zeitlicher_bezug(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None ¶
Extract the first four connected digits of the string as year.
- Parameters:
resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field
- Returns:
string with first four digits treated as zeitlicher_bezug year
mex.biospecimen.main module¶
mex.biospecimen.settings module¶
- class mex.biospecimen.settings.BiospecimenSettings(*, dir_path: AssetsPath = AssetsPath('raw-data/biospecimen'), key_col: str = 'Feldname', val_col: str = 'zu extrahierender Wert (maschinenlesbar)')¶
Bases:
BaseModel
Settings submodel for the Biospecimen extractor.
- dir_path: AssetsPath¶
- key_col: str¶
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'dir_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/biospecimen"), description='Path to the directory with the biospecimen excel files, absolute path or relative to `assets_dir`.'), 'key_col': FieldInfo(annotation=str, required=False, default='Feldname', description='column name of the biospecimen metadata keys'), 'val_col': FieldInfo(annotation=str, required=False, default='zu extrahierender Wert (maschinenlesbar)', description='column name of the biospecimen metadata values')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- val_col: str¶
mex.biospecimen.transform module¶
- mex.biospecimen.transform.transform_biospecimen_resource_to_mex_resource(biospecimen_resources: Iterable[BiospecimenResource], extracted_platform_biospecimen: ExtractedAccessPlatform, unit_stable_target_ids_by_synonym: dict[str, Identifier], mex_persons: Iterable[ExtractedPerson], extracted_organization_rki: ExtractedOrganization, extracted_synopse_activities: Iterable[ExtractedActivity]) Generator[ExtractedResource, None, None] ¶
Transform Biospecimen resources to extracted resources.
- Parameters:
biospecimen_resources – Biospecimen resources
extracted_platform_biospecimen – Extracted platform for Biospecimen
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
mex_persons – Generator for ExtractedPerson
extracted_synopse_activities – extracted synopse activitiesq
extracted_organization_rki – extractded organization
- Returns:
Generator for ExtractedResource instances