mex.biospecimen package

Subpackages

Submodules

mex.biospecimen.extract module

mex.biospecimen.extract.extract_biospecimen_contacts_by_email(biospecimen_resource: Iterable[BiospecimenResource]) Generator[LDAPPerson, None, None]

Extract LDAP persons for Biospecimen contacts.

Parameters:

biospecimen_resource – Biospecimen resources

Returns:

Generator for LDAP persons

mex.biospecimen.extract.extract_biospecimen_resource(resource: DataFrame, sheet_name: str) BiospecimenResource | None

Extract one Biospecimen resource from an xlsx file.

Parameters:
  • resource – DataFrame containing resource information

  • sheet_name – Name of the Excel sheet the data came from

Settings:

key_col: column in the file with keys val_col: column in the file with values

Returns:

Biospecimen resource

mex.biospecimen.extract.extract_biospecimen_resources() Generator[BiospecimenResource, None, None]

Extract Biospecimen resources by loading data from MS-Excel file.

Settings:
dir_path: Path to the biospecimen directory,

absolute or relative to assets_dir

Returns:

Generator for Biospecimen resources

mex.biospecimen.extract.get_clean_file_name(file_name: str) str

Clean file name string.

Parameters:

file_name – file_name string

Returns:

cleaned file name string

mex.biospecimen.extract.get_clean_string(series: Series[Any]) str

Clean string DataFrame and concatenate to one string.

Parameters:

series – series of related field

Returns:

string of extracted field

mex.biospecimen.extract.get_values(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None

Extract values of resource corresponding to Feldname.

Parameters:
  • resource – Biospecimen resource

  • key_col – column in the file with keys

  • val_col – column in the file with values

  • field_name – column name of extracted field

Returns:

string of extracted field

mex.biospecimen.extract.get_year_from_zeitlicher_bezug(resource: DataFrame | None, key_col: str, val_col: str, field_name: str) str | None

Extract the first four connected digits of the string as year.

Parameters:
  • resource – Biospecimen resource

  • key_col – column in the file with keys

  • val_col – column in the file with values

  • field_name – column name of extracted field

Returns:

string with first four digits treated as zeitlicher_bezug year

mex.biospecimen.main module

mex.biospecimen.settings module

class mex.biospecimen.settings.BiospecimenSettings(*, dir_path: AssetsPath = AssetsPath('raw-data/biospecimen'), key_col: str = 'Feldname', val_col: str = 'zu extrahierender Wert (maschinenlesbar)')

Bases: BaseModel

Settings submodel for the Biospecimen extractor.

dir_path: AssetsPath
key_col: str
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dir_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/biospecimen"), description='Path to the directory with the biospecimen excel files, absolute path or relative to `assets_dir`.'), 'key_col': FieldInfo(annotation=str, required=False, default='Feldname', description='column name of the biospecimen metadata keys'), 'val_col': FieldInfo(annotation=str, required=False, default='zu extrahierender Wert (maschinenlesbar)', description='column name of the biospecimen metadata values')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

val_col: str

mex.biospecimen.transform module

mex.biospecimen.transform.transform_biospecimen_resource_to_mex_resource(biospecimen_resources: Iterable[BiospecimenResource], extracted_platform_biospecimen: ExtractedAccessPlatform, unit_stable_target_ids_by_synonym: dict[str, Identifier], mex_persons: Iterable[ExtractedPerson], extracted_organization_rki: ExtractedOrganization, extracted_synopse_activities: Iterable[ExtractedActivity]) Generator[ExtractedResource, None, None]

Transform Biospecimen resources to extracted resources.

Parameters:
  • biospecimen_resources – Biospecimen resources

  • extracted_platform_biospecimen – Extracted platform for Biospecimen

  • unit_stable_target_ids_by_synonym – Unit stable target ids by synonym

  • mex_persons – Generator for ExtractedPerson

  • extracted_synopse_activities – extracted synopse activitiesq

  • extracted_organization_rki – extractded organization

Returns:

Generator for ExtractedResource instances

Module contents