mex.biospecimen package¶

Subpackages¶

mex.biospecimen.models package

Submodules¶

mex.biospecimen.extract module¶

mex.biospecimen.extract.extract_biospecimen_contacts_by_email(biospecimen_resource: Iterable[BiospecimenResource]) → Generator[LDAPPerson, None, None]¶

Extract LDAP persons for Biospecimen contacts.

Parameters:: biospecimen_resource – Biospecimen resources
Returns:: Generator for LDAP persons

mex.biospecimen.extract.extract_biospecimen_resource(resource: DataFrame, sheet_name: str) → BiospecimenResource | None¶

Extract one Biospecimen resource from an xlsx file.

Parameters:

resource – DataFrame containing resource information
sheet_name – Name of the Excel sheet the data came from

Settings:: key_col: column in the file with keys val_col: column in the file with values

Returns:: Biospecimen resource

mex.biospecimen.extract.extract_biospecimen_resources() → Generator[BiospecimenResource, None, None]¶

Extract Biospecimen resources by loading data from MS-Excel file.

Settings:

dir_path: Path to the biospecimen directory,: absolute or relative to assets_dir

Returns:: Generator for Biospecimen resources

mex.biospecimen.extract.get_clean_file_name(file_name: str) → str¶

Clean file name string.

Parameters:: file_name – file_name string
Returns:: cleaned file name string

mex.biospecimen.extract.get_clean_string(series: Series[Any]) → str¶

Clean string DataFrame and concatenate to one string.

Parameters:: series – series of related field
Returns:: string of extracted field

mex.biospecimen.extract.get_values(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) → str | None¶

Extract values of resource corresponding to Feldname.

Parameters:

resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field

Returns:

string of extracted field

mex.biospecimen.extract.get_year_from_zeitlicher_bezug(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) → str | None¶

Extract the first four connected digits of the string as year.

Parameters:

resource – Biospecimen resource
key_col – column in the file with keys
val_col – column in the file with values
field_name – column name of extracted field

Returns:

string with first four digits treated as zeitlicher_bezug year

mex.biospecimen.main module¶

mex.biospecimen.settings module¶

class mex.biospecimen.settings.BiospecimenSettings(*, dir_path: AssetsPath = AssetsPath('raw-data/biospecimen'), key_col: str = 'Feldname', val_col: str = 'zu extrahierender Wert (maschinenlesbar)')¶

Bases: BaseModel

Settings submodel for the Biospecimen extractor.

dir_path: AssetsPath¶

key_col: str¶

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dir_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/biospecimen"), description='Path to the directory with the biospecimen excel files, absolute path or relative to `assets_dir`.'), 'key_col': FieldInfo(annotation=str, required=False, default='Feldname', description='column name of the biospecimen metadata keys'), 'val_col': FieldInfo(annotation=str, required=False, default='zu extrahierender Wert (maschinenlesbar)', description='column name of the biospecimen metadata values')}¶

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

val_col: str¶

mex.biospecimen.transform module¶

mex.biospecimen.transform.transform_biospecimen_resource_to_mex_resource(biospecimen_resources: Iterable[BiospecimenResource], extracted_platform_biospecimen: ExtractedAccessPlatform, unit_stable_target_ids_by_synonym: dict[str, Identifier], mex_persons: Iterable[ExtractedPerson], extracted_organization_rki: ExtractedOrganization, extracted_synopse_activities: Iterable[ExtractedActivity]) → Generator[ExtractedResource, None, None]¶

Transform Biospecimen resources to extracted resources.

Parameters:

biospecimen_resources – Biospecimen resources
extracted_platform_biospecimen – Extracted platform for Biospecimen
unit_stable_target_ids_by_synonym – Unit stable target ids by synonym
mex_persons – Generator for ExtractedPerson
extracted_synopse_activities – extracted synopse activitiesq
extracted_organization_rki – extractded organization

Returns:

Generator for ExtractedResource instances

mex.biospecimen package¶

Subpackages¶

Submodules¶

mex.biospecimen.extract module¶

mex.biospecimen.main module¶

mex.biospecimen.settings module¶

mex.biospecimen.transform module¶

Module contents¶

mex-extractors

Navigation

Related Topics