mex.extractors.odk package

Submodules

mex.extractors.odk.extract module

mex.extractors.odk.extract.extract_odk_raw_data() list[ODKData]

Extract odk raw data by loading data from MS-Excel file.

Settings:
odk.raw_data_path: Path to the odk raw data,

absolute or relative to assets_dir

Returns:

list of ODK data.

mex.extractors.odk.extract.get_column_dict_by_pattern(sheet: DataFrame, pattern: str) dict[str, list[str | float]]

Get a dict of columns by matching pattern.

Parameters:
  • sheet – sheet to extract columns from

  • pattern – pattern to match column names

Returns:

dictionary of matching columns by column names

mex.extractors.odk.extract.get_external_partner_and_publisher_by_label(odk_resource_mappings: list[Any]) dict[str, WikidataOrganization]

Search and extract partner organization from wikidata.

Parameters:

odk_resource_mappings – list of resource mapping models

Returns:

Dict with organization label and WikidataOrganization

mex.extractors.odk.main module

mex.extractors.odk.model module

class mex.extractors.odk.model.ODKData(*, file_name: str, hint: dict[str, list[str | float]], label_choices: dict[str, list[str | float]], label_survey: dict[str, list[str | float]], list_name: list[str | float], name: list[str | float], type: list[str | float])

Bases: BaseModel

Model class for odk data.

file_name: str
hint: dict[str, list[str | float]]
label_choices: dict[str, list[str | float]]
label_survey: dict[str, list[str | float]]
list_name: list[str | float]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'file_name': FieldInfo(annotation=str, required=True), 'hint': FieldInfo(annotation=dict[str, list[Union[str, float]]], required=True), 'label_choices': FieldInfo(annotation=dict[str, list[Union[str, float]]], required=True), 'label_survey': FieldInfo(annotation=dict[str, list[Union[str, float]]], required=True), 'list_name': FieldInfo(annotation=list[Union[str, float]], required=True), 'name': FieldInfo(annotation=list[Union[str, float]], required=True), 'type': FieldInfo(annotation=list[Union[str, float]], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

name: list[str | float]
type: list[str | float]

mex.extractors.odk.settings module

class mex.extractors.odk.settings.ODKSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/odk'), mapping_path: AssetsPath = AssetsPath('mappings/__final__/odk'))

Bases: BaseModel

Settings submodel definition for odk data extraction.

mapping_path: AssetsPath
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/odk"), description='Path to the directory with the odk mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'raw_data_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/odk"), description='Path to the directory with the odk excel files, absolute path or relative to `assets_dir`.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

raw_data_path: AssetsPath

mex.extractors.odk.transform module

mex.extractors.odk.transform.assign_resource_relations(resources: dict[str, ExtractedResource], is_part_of_list: list[str]) list[ExtractedResource]

Assign resources related to each other.

Parameters:
  • resources – list of mex resources

  • is_part_of_list – list of resources which are part of another

Returns:

list of mex resources

mex.extractors.odk.transform.get_value_set(type_cell: str, choice_sheet: ODKData) list[str]

Get value sets for types cells that start with select_one or multiple_one.

Parameters:
  • type_cell – one type cell

  • choice_sheet – choice sheet corresponding to type cell

Returns:

list of value sets matched to type cell

mex.extractors.odk.transform.get_variable_groups_from_raw_data(odk_raw_data: list[ODKData]) dict[str, list[dict[str, str]]]

Get variable groups from raw data by parsing for begin_group and end_group.

Parameters:

odk_raw_data – raw data extracted from Excel files

Returns:

dictionary of odk groups by group name

mex.extractors.odk.transform.transform_odk_data_to_extracted_variables(extracted_resources_odk: list[ExtractedResource], extracted_variable_groups_odk: list[ExtractedVariableGroup], odk_variable_groups: dict[str, list[dict[str, str]]], odk_raw_data: list[ODKData], extracted_primary_source_odk: ExtractedPrimarySource) list[ExtractedVariable]

Transform odk variables to mex variables.

Parameters:
  • extracted_resources_odk – extracted mex resources

  • extracted_variable_groups_odk – extracted mex variable groups

  • odk_variable_groups – dictionary of odk groups by group name

  • odk_raw_data – raw data extracted from Excel files

  • extracted_primary_source_odk – odk primary source

Returns:

list of mex variables

mex.extractors.odk.transform.transform_odk_resources_to_mex_resources(odk_resource_mappings: list[Any], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], external_partner_and_publisher_by_label: dict[str, MergedOrganizationIdentifier], extracted_international_projects_activities: list[ExtractedActivity], extracted_primary_source_mex: ExtractedPrimarySource) tuple[dict[str, ExtractedResource], list[str]]

Transform odk resources to mex resources.

Parameters:
  • odk_resource_mappings – list of resource mapping models

  • unit_stable_target_ids_by_synonym – dict of OrganizationalUnitIds

  • extracted_primary_source_international_projects – primary source

  • external_partner_and_publisher_by_label – dict of wikidata OrganizationIDs

  • extracted_international_projects_activities – list of extracted international projects activities

  • extracted_primary_source_mex – mex primary source

Returns:

tuple of list of mex resources and list of resources which are part of another

mex.extractors.odk.transform.transform_odk_variable_groups_to_extracted_variable_groups(odk_variable_groups: dict[str, list[dict[str, str]]], extracted_resources_odk: list[ExtractedResource], extracted_primary_source_odk: ExtractedPrimarySource) list[ExtractedVariableGroup]

Transform odk variable groups to mex variable groups.

Parameters:
  • odk_variable_groups – dictionary of odk groups by group name

  • extracted_resources_odk – extracted mex resources

  • extracted_primary_source_odk – odk primary source

Returns:

list of mex variable groups

Module contents