mex.extractors.international_projects package

Subpackages

Submodules

mex.extractors.international_projects.extract module

mex.extractors.international_projects.extract.extract_international_projects_funding_sources(international_projects_sources: Iterable[InternationalProjectsSource]) dict[str, WikidataOrganization]

Search and extract funding organization from wikidata.

Parameters:

international_projects_sources – Iterable of international-project sources

Returns:

Dict with organization label and WikidataOrganization

mex.extractors.international_projects.extract.extract_international_projects_partner_organizations(international_projects_sources: Iterable[InternationalProjectsSource]) dict[str, WikidataOrganization]

Search and extract partner organization from wikidata.

Parameters:

international_projects_sources – Iterable of international-project sources

Returns:

Dict with organization label and WikidataOrganization

mex.extractors.international_projects.extract.extract_international_projects_project_leaders(international_projects_sources: Iterable[InternationalProjectsSource]) Generator[LDAPPersonWithQuery, None, None]

Extract LDAP persons with their query string for project leaders.

Parameters:

international_projects_sources – international projects sources

Returns:

Generator for LDAP persons with query

mex.extractors.international_projects.extract.extract_international_projects_source(row: pd.Series[Any]) InternationalProjectsSource | None

Extract one international projects source from an xlrd row.

Parameters:
  • row – xlrd row representing one source

  • column_indices – indices by column names

Returns:

international projects source, or None

mex.extractors.international_projects.extract.extract_international_projects_sources() Generator[InternationalProjectsSource, None, None]

Extract international projects sources by loading data from MS-Excel file.

Returns:

Generator for international projects sources

mex.extractors.international_projects.extract.get_clean_organizations_names(organizations_str: str) list[str]

Get clean names for partner organizations.

Parameters:

organizations_str (str) – string containing all organizations names

Returns:

list of clean organizations names

mex.extractors.international_projects.extract.get_temporal_entity_from_cell(cell_value: Any) TemporalEntity | YearMonthDay | None

Try to extract a temporal_entity from a cell.

Parameters:

cell_value – Value of a cell, could be int, string or datetime

Returns:

TemporalEntity or None

mex.extractors.international_projects.main module

mex.extractors.international_projects.settings module

class mex.extractors.international_projects.settings.InternationalProjectsSettings(*, file_path: AssetsPath = AssetsPath('raw-data/international-projects/international_projects.xlsx'), mapping_path: AssetsPath = AssetsPath('mappings/__final__/international-projects'))

Bases: BaseModel

Settings submodel definition for the international projects extractor.

file_path: AssetsPath
mapping_path: AssetsPath
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'file_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/international-projects/international_projects.xlsx"), description='Path to the international projects excel file, absolute path or relative to `assets_dir`.'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/international-projects"), description='Path to the directory with the international-projects mapping files containing the default values, absolute path or relative to `assets_dir`.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

mex.extractors.international_projects.transform module

mex.extractors.international_projects.transform.get_or_create_partner_organization(partner_organization: list[str], extracted_organizations: dict[str, MergedOrganizationIdentifier], extracted_primary_source: ExtractedPrimarySource) list[MergedOrganizationIdentifier]

Get partner organizations merged ids.

Parameters:
  • partner_organization – partner organizations from the source

  • extracted_organizations – merged organization identifier extracted from wikidata

  • extracted_primary_source – Extracted primary_source for international projects

Returns:

list of matched or created merged organization identifier

mex.extractors.international_projects.transform.get_theme_for_activity_or_topic(theme: list[GenericField], activity1: str | None, activity2: str | None, topic1: str | None, topic2: str | None) list[Theme]

Get theme identifier for activities and topics.

Parameters:
  • theme – theme extracted from mapping

  • activity1 – activity 1 from the international-projects raw data file

  • activity2 – activity 2 from the international-projects raw data file

  • topic1 – topic 1 from the international-projects raw data file

  • topic2 – topic 2 from the international-projects raw data file

Returns:

Sorted list of Theme

mex.extractors.international_projects.transform.transform_international_projects_source_to_extracted_activity(source: InternationalProjectsSource, international_projects_activity: Any, extracted_primary_source: ExtractedPrimarySource, person_stable_target_ids_by_query_string: dict[Hashable, list[MergedPersonIdentifier]], unit_stable_target_id_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], funding_sources_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier], partner_organizations_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier]) ExtractedActivity | None

Transform international projects source to extracted activity.

Parameters:
  • source – international projects sources

  • international_projects_activity – activity mapping model with default values

  • extracted_primary_source – Extracted primary_source for international Projects

  • person_stable_target_ids_by_query_string – Mapping from author query to person stable target ID

  • unit_stable_target_id_by_synonym – Mapping from unit acronyms and labels to unit stable target ID

  • funding_sources_stable_target_id_by_query – Mapping from funding sources to organization stable target ID

  • partner_organizations_stable_target_id_by_query – Mapping from partner orgs to their stable target ID

Returns:

ExtractedActivity or None if it was filtered out

mex.extractors.international_projects.transform.transform_international_projects_sources_to_extracted_activities(international_projects_sources: Iterable[InternationalProjectsSource], international_projects_activity: Any, extracted_primary_source: ExtractedPrimarySource, person_stable_target_ids_by_query_string: dict[Hashable, list[MergedPersonIdentifier]], unit_stable_target_id_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], funding_sources_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier], partner_organizations_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier]) Generator[ExtractedActivity, None, None]

Transform international projects sources to extracted activity.

Parameters:
  • international_projects_sources – international projects sources

  • international_projects_activity – activity mapping model with default values

  • extracted_primary_source – Extracted primary_source for FF Projects

  • person_stable_target_ids_by_query_string – Mapping from author query to person stable target ID

  • unit_stable_target_id_by_synonym – Mapping from unit acronyms and labels to unit stable target ID

  • funding_sources_stable_target_id_by_query – Mapping from funding sources to organization stable target ID

  • partner_organizations_stable_target_id_by_query – Mapping from partner orgs to their stable target ID

Returns:

Generator for ExtractedActivity instances

Module contents