mex.extractors.odk package¶
Submodules¶
mex.extractors.odk.extract module¶
- mex.extractors.odk.extract.extract_odk_raw_data() list[ODKData]¶
Extract odk raw data by loading data from MS-Excel file.
- Settings:
- odk.raw_data_path: Path to the odk raw data,
absolute or relative to assets_dir
- Returns:
list of ODK data.
- mex.extractors.odk.extract.get_column_dict_by_pattern(sheet: DataFrame, pattern: str) dict[str, list[str | float]]¶
Get a dict of columns by matching pattern.
- Parameters:
sheet – sheet to extract columns from
pattern – pattern to match column names
- Returns:
dictionary of matching columns by column names
- mex.extractors.odk.extract.get_external_partner_and_publisher_by_label(odk_resource_mappings: list[ResourceMapping]) dict[str, MergedOrganizationIdentifier]¶
Search and extract partner organization from wikidata.
- Parameters:
odk_resource_mappings – list of resource mapping models
- Returns:
Dict with organization label and WikidataOrganization
mex.extractors.odk.filter module¶
- mex.extractors.odk.filter.is_invalid_odk_variable(type_row: str | float) bool¶
Check whether type row is a valid odk variable.
- Parameters:
type_row – row in a type column
- Returns:
True if type_row corresponds to invalid variable else False
mex.extractors.odk.main module¶
mex.extractors.odk.model module¶
- class mex.extractors.odk.model.ODKData(*, file_name: str, label_choices: dict[str, list[str | float]], label_survey: dict[str, list[str | float]], list_name_choices: list[str | float], name_choices: list[str | float], name_survey: list[str | float], type_survey: list[str | float])¶
Bases:
BaseModelModel class for odk data.
- file_name: str¶
- label_choices: dict[str, list[str | float]]¶
- label_survey: dict[str, list[str | float]]¶
- list_name_choices: list[str | float]¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name_choices: list[str | float]¶
- name_survey: list[str | float]¶
- type_survey: list[str | float]¶
mex.extractors.odk.settings module¶
- class mex.extractors.odk.settings.ODKSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/odk'), mapping_path: AssetsPath = AssetsPath('mappings/odk'))¶
Bases:
BaseModelSettings submodel definition for odk data extraction.
- mapping_path: AssetsPath¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- raw_data_path: AssetsPath¶
mex.extractors.odk.transform module¶
- mex.extractors.odk.transform.assign_resource_relations_and_load(resources_tuple: tuple[list[ExtractedResource], list[ExtractedResource]]) list[ExtractedResource]¶
Assign resources related to each other.
- Parameters:
resources_tuple – tuple of list of mex resources
- Returns:
list of mex resources
- mex.extractors.odk.transform.get_value_set(type_cell: str, file: ODKData) list[str]¶
Get value sets for types cells that start with select_one or multiple_one.
- Parameters:
type_cell – one type cell
file – choice sheet corresponding to type cell
- Returns:
list of value sets matched to type cell
- mex.extractors.odk.transform.transform_odk_data_to_extracted_variables(odk_extracted_resources: list[ExtractedResource], odk_raw_data: list[ODKData], variable_mapping: VariableMapping) list[ExtractedVariable]¶
Transform odk variables to mex variables.
- Parameters:
odk_extracted_resources – extracted mex resources
odk_raw_data – raw data extracted from Excel files
variable_mapping – variable mapping default values
- Returns:
list of mex variables
- mex.extractors.odk.transform.transform_odk_resources_to_mex_resources(odk_resource_mappings: list[ResourceMapping], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], odk_merged_organization_ids_by_str: dict[str, MergedOrganizationIdentifier], international_projects_extracted_activities: list[ExtractedActivity]) tuple[list[ExtractedResource], list[ExtractedResource]]¶
Transform odk resources to mex resources.
- Parameters:
odk_resource_mappings – list of resource mapping models
unit_stable_target_ids_by_synonym – dict of OrganizationalUnitIds
international_projects_extracted_primary_source – primary source
odk_merged_organization_ids_by_str – dict of wikidata OrganizationIDs
international_projects_extracted_activities – list of extracted international projects activities
- Returns:
tuple of list of mex child and non-child resources