mex.extractors.odk package

Submodules

mex.extractors.odk.extract module

mex.extractors.odk.extract.extract_odk_raw_data() list[ODKData]

Extract odk raw data by loading data from MS-Excel file.

Settings:
odk.raw_data_path: Path to the odk raw data,

absolute or relative to assets_dir

Returns:

list of ODK data.

mex.extractors.odk.extract.get_column_dict_by_pattern(sheet: DataFrame, pattern: str) dict[str, list[str | float]]

Get a dict of columns by matching pattern.

Parameters:
  • sheet – sheet to extract columns from

  • pattern – pattern to match column names

Returns:

dictionary of matching columns by column names

mex.extractors.odk.extract.get_external_partner_and_publisher_by_label(odk_resource_mappings: list[ResourceMapping]) dict[str, MergedOrganizationIdentifier]

Search and extract partner organization from wikidata.

Parameters:

odk_resource_mappings – list of resource mapping models

Returns:

Dict with organization label and WikidataOrganization

mex.extractors.odk.filter module

mex.extractors.odk.filter.is_invalid_odk_variable(type_row: str | float) bool

Check whether type row is a valid odk variable.

Parameters:

type_row – row in a type column

Returns:

True if type_row corresponds to invalid variable else False

mex.extractors.odk.main module

mex.extractors.odk.model module

class mex.extractors.odk.model.ODKData(*, file_name: str, label_choices: dict[str, list[str | float]], label_survey: dict[str, list[str | float]], list_name_choices: list[str | float], name_choices: list[str | float], name_survey: list[str | float], type_survey: list[str | float])

Bases: BaseModel

Model class for odk data.

file_name: str
label_choices: dict[str, list[str | float]]
label_survey: dict[str, list[str | float]]
list_name_choices: list[str | float]
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name_choices: list[str | float]
name_survey: list[str | float]
type_survey: list[str | float]

mex.extractors.odk.settings module

class mex.extractors.odk.settings.ODKSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/odk'), mapping_path: AssetsPath = AssetsPath('mappings/odk'))

Bases: BaseModel

Settings submodel definition for odk data extraction.

mapping_path: AssetsPath
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_data_path: AssetsPath

mex.extractors.odk.transform module

mex.extractors.odk.transform.assign_resource_relations_and_load(resources_tuple: tuple[list[ExtractedResource], list[ExtractedResource]]) list[ExtractedResource]

Assign resources related to each other.

Parameters:

resources_tuple – tuple of list of mex resources

Returns:

list of mex resources

mex.extractors.odk.transform.get_value_set(type_cell: str, file: ODKData) list[str]

Get value sets for types cells that start with select_one or multiple_one.

Parameters:
  • type_cell – one type cell

  • file – choice sheet corresponding to type cell

Returns:

list of value sets matched to type cell

mex.extractors.odk.transform.transform_odk_data_to_extracted_variables(odk_extracted_resources: list[ExtractedResource], odk_raw_data: list[ODKData], variable_mapping: VariableMapping) list[ExtractedVariable]

Transform odk variables to mex variables.

Parameters:
  • odk_extracted_resources – extracted mex resources

  • odk_raw_data – raw data extracted from Excel files

  • variable_mapping – variable mapping default values

Returns:

list of mex variables

mex.extractors.odk.transform.transform_odk_resources_to_mex_resources(odk_resource_mappings: list[ResourceMapping], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], odk_merged_organization_ids_by_str: dict[str, MergedOrganizationIdentifier], international_projects_extracted_activities: list[ExtractedActivity]) tuple[list[ExtractedResource], list[ExtractedResource]]

Transform odk resources to mex resources.

Parameters:
  • odk_resource_mappings – list of resource mapping models

  • unit_stable_target_ids_by_synonym – dict of OrganizationalUnitIds

  • international_projects_extracted_primary_source – primary source

  • odk_merged_organization_ids_by_str – dict of wikidata OrganizationIDs

  • international_projects_extracted_activities – list of extracted international projects activities

Returns:

tuple of list of mex child and non-child resources

Module contents