mex.extractors.odk package¶
Submodules¶
mex.extractors.odk.extract module¶
- mex.extractors.odk.extract.extract_odk_raw_data() list[ODKData] ¶
Extract odk raw data by loading data from MS-Excel file.
- Settings:
- odk.raw_data_path: Path to the odk raw data,
absolute or relative to assets_dir
- Returns:
list of ODK data.
- mex.extractors.odk.extract.get_column_dict_by_pattern(sheet: DataFrame, pattern: str) dict[str, list[str | float]] ¶
Get a dict of columns by matching pattern.
- Parameters:
sheet – sheet to extract columns from
pattern – pattern to match column names
- Returns:
dictionary of matching columns by column names
- mex.extractors.odk.extract.get_external_partner_and_publisher_by_label(odk_resource_mappings: list[Any]) dict[str, WikidataOrganization] ¶
Search and extract partner organization from wikidata.
- Parameters:
odk_resource_mappings – list of resource mapping models
- Returns:
Dict with organization label and WikidataOrganization
mex.extractors.odk.main module¶
mex.extractors.odk.model module¶
- class mex.extractors.odk.model.ODKData(*, file_name: str, hint: dict[str, list[str | float]], label_choices: dict[str, list[str | float]], label_survey: dict[str, list[str | float]], list_name: list[str | float], name: list[str | float], type: list[str | float])¶
Bases:
BaseModel
Model class for odk data.
- file_name: str¶
- hint: dict[str, list[str | float]]¶
- label_choices: dict[str, list[str | float]]¶
- label_survey: dict[str, list[str | float]]¶
- list_name: list[str | float]¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'file_name': FieldInfo(annotation=str, required=True), 'hint': FieldInfo(annotation=dict[str, list[Union[str, float]]], required=True), 'label_choices': FieldInfo(annotation=dict[str, list[Union[str, float]]], required=True), 'label_survey': FieldInfo(annotation=dict[str, list[Union[str, float]]], required=True), 'list_name': FieldInfo(annotation=list[Union[str, float]], required=True), 'name': FieldInfo(annotation=list[Union[str, float]], required=True), 'type': FieldInfo(annotation=list[Union[str, float]], required=True)}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- name: list[str | float]¶
- type: list[str | float]¶
mex.extractors.odk.settings module¶
- class mex.extractors.odk.settings.ODKSettings(*, raw_data_path: AssetsPath = AssetsPath('raw-data/odk'), mapping_path: AssetsPath = AssetsPath('mappings/__final__/odk'))¶
Bases:
BaseModel
Settings submodel definition for odk data extraction.
- mapping_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/odk"), description='Path to the directory with the odk mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'raw_data_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/odk"), description='Path to the directory with the odk excel files, absolute path or relative to `assets_dir`.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- raw_data_path: AssetsPath¶
mex.extractors.odk.transform module¶
- mex.extractors.odk.transform.assign_resource_relations(resources: dict[str, ExtractedResource], is_part_of_list: list[str]) list[ExtractedResource] ¶
Assign resources related to each other.
- Parameters:
resources – list of mex resources
is_part_of_list – list of resources which are part of another
- Returns:
list of mex resources
- mex.extractors.odk.transform.get_value_set(type_cell: str, choice_sheet: ODKData) list[str] ¶
Get value sets for types cells that start with select_one or multiple_one.
- Parameters:
type_cell – one type cell
choice_sheet – choice sheet corresponding to type cell
- Returns:
list of value sets matched to type cell
- mex.extractors.odk.transform.get_variable_groups_from_raw_data(odk_raw_data: list[ODKData]) dict[str, list[dict[str, str]]] ¶
Get variable groups from raw data by parsing for begin_group and end_group.
- Parameters:
odk_raw_data – raw data extracted from Excel files
- Returns:
dictionary of odk groups by group name
- mex.extractors.odk.transform.transform_odk_data_to_extracted_variables(extracted_resources_odk: list[ExtractedResource], extracted_variable_groups_odk: list[ExtractedVariableGroup], odk_variable_groups: dict[str, list[dict[str, str]]], odk_raw_data: list[ODKData], extracted_primary_source_odk: ExtractedPrimarySource) list[ExtractedVariable] ¶
Transform odk variables to mex variables.
- Parameters:
extracted_resources_odk – extracted mex resources
extracted_variable_groups_odk – extracted mex variable groups
odk_variable_groups – dictionary of odk groups by group name
odk_raw_data – raw data extracted from Excel files
extracted_primary_source_odk – odk primary source
- Returns:
list of mex variables
- mex.extractors.odk.transform.transform_odk_resources_to_mex_resources(odk_resource_mappings: list[Any], unit_stable_target_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], external_partner_and_publisher_by_label: dict[str, MergedOrganizationIdentifier], extracted_international_projects_activities: list[ExtractedActivity], extracted_primary_source_mex: ExtractedPrimarySource) tuple[dict[str, ExtractedResource], list[str]] ¶
Transform odk resources to mex resources.
- Parameters:
odk_resource_mappings – list of resource mapping models
unit_stable_target_ids_by_synonym – dict of OrganizationalUnitIds
extracted_primary_source_international_projects – primary source
external_partner_and_publisher_by_label – dict of wikidata OrganizationIDs
extracted_international_projects_activities – list of extracted international projects activities
extracted_primary_source_mex – mex primary source
- Returns:
tuple of list of mex resources and list of resources which are part of another
- mex.extractors.odk.transform.transform_odk_variable_groups_to_extracted_variable_groups(odk_variable_groups: dict[str, list[dict[str, str]]], extracted_resources_odk: list[ExtractedResource], extracted_primary_source_odk: ExtractedPrimarySource) list[ExtractedVariableGroup] ¶
Transform odk variable groups to mex variable groups.
- Parameters:
odk_variable_groups – dictionary of odk groups by group name
extracted_resources_odk – extracted mex resources
extracted_primary_source_odk – odk primary source
- Returns:
list of mex variable groups