mex.extractors.international_projects package¶
Subpackages¶
- mex.extractors.international_projects.models package
- Submodules
- mex.extractors.international_projects.models.source module
InternationalProjectsSource
InternationalProjectsSource.activity1
InternationalProjectsSource.activity2
InternationalProjectsSource.additional_rki_units
InternationalProjectsSource.end_date
InternationalProjectsSource.full_project_name
InternationalProjectsSource.funding_program
InternationalProjectsSource.funding_source
InternationalProjectsSource.funding_type
InternationalProjectsSource.get_end_year()
InternationalProjectsSource.get_funding_sources()
InternationalProjectsSource.get_identifier_in_primary_source()
InternationalProjectsSource.get_partners()
InternationalProjectsSource.get_project_lead_persons()
InternationalProjectsSource.get_project_lead_rki_units()
InternationalProjectsSource.get_start_year()
InternationalProjectsSource.get_units()
InternationalProjectsSource.model_computed_fields
InternationalProjectsSource.model_config
InternationalProjectsSource.model_fields
InternationalProjectsSource.partner_organization
InternationalProjectsSource.project_abbreviation
InternationalProjectsSource.project_lead_person
InternationalProjectsSource.project_lead_rki_unit
InternationalProjectsSource.rki_internal_project_number
InternationalProjectsSource.start_date
InternationalProjectsSource.topic1
InternationalProjectsSource.topic2
InternationalProjectsSource.website
- Module contents
Submodules¶
mex.extractors.international_projects.extract module¶
- mex.extractors.international_projects.extract.extract_international_projects_funding_sources(international_projects_sources: Iterable[InternationalProjectsSource]) dict[str, MergedOrganizationIdentifier] ¶
Search and extract funding organization from wikidata.
- Parameters:
international_projects_sources – Iterable of international-project sources
- Returns:
Dict with organization label and WikidataOrganization
- mex.extractors.international_projects.extract.extract_international_projects_partner_organizations(international_projects_sources: Iterable[InternationalProjectsSource]) dict[str, MergedOrganizationIdentifier] ¶
Search and extract partner organization from wikidata.
- Parameters:
international_projects_sources – Iterable of international-project sources
- Returns:
Dict with organization label and WikidataOrganization
- mex.extractors.international_projects.extract.extract_international_projects_project_leaders(international_projects_sources: Iterable[InternationalProjectsSource]) Generator[LDAPPersonWithQuery, None, None] ¶
Extract LDAP persons with their query string for project leaders.
- Parameters:
international_projects_sources – international projects sources
- Returns:
Generator for LDAP persons with query
- mex.extractors.international_projects.extract.extract_international_projects_source(row: pd.Series[Any]) InternationalProjectsSource | None ¶
Extract one international projects source from an xlrd row.
- Parameters:
row – xlrd row representing one source
column_indices – indices by column names
- Returns:
international projects source, or None
- mex.extractors.international_projects.extract.extract_international_projects_sources() Generator[InternationalProjectsSource, None, None] ¶
Extract international projects sources by loading data from MS-Excel file.
- Returns:
Generator for international projects sources
- mex.extractors.international_projects.extract.get_clean_organizations_names(organizations_str: str) list[str] ¶
Get clean names for partner organizations.
- Parameters:
organizations_str (str) – string containing all organizations names
- Returns:
list of clean organizations names
- mex.extractors.international_projects.extract.get_temporal_entity_from_cell(cell_value: Any) TemporalEntity | YearMonthDay | None ¶
Try to extract a temporal_entity from a cell.
- Parameters:
cell_value – Value of a cell, could be int, string or datetime
- Returns:
TemporalEntity or None
mex.extractors.international_projects.main module¶
mex.extractors.international_projects.settings module¶
- class mex.extractors.international_projects.settings.InternationalProjectsSettings(*, file_path: AssetsPath = AssetsPath('raw-data/international-projects/international_projects.xlsx'), mapping_path: AssetsPath = AssetsPath('mappings/__final__/international-projects'))¶
Bases:
BaseModel
Settings submodel definition for the international projects extractor.
- file_path: AssetsPath¶
- mapping_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'file_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/international-projects/international_projects.xlsx"), description='Path to the international projects excel file, absolute path or relative to `assets_dir`.'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/international-projects"), description='Path to the directory with the international-projects mapping files containing the default values, absolute path or relative to `assets_dir`.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
mex.extractors.international_projects.transform module¶
- mex.extractors.international_projects.transform.get_or_create_partner_organization(partner_organization: list[str], extracted_organizations: dict[str, MergedOrganizationIdentifier], extracted_primary_source: ExtractedPrimarySource) list[MergedOrganizationIdentifier] ¶
Get partner organizations merged ids.
- Parameters:
partner_organization – partner organizations from the source
extracted_organizations – merged organization identifier extracted from wikidata
extracted_primary_source – Extracted primary_source for international projects
- Returns:
list of matched or created merged organization identifier
- mex.extractors.international_projects.transform.get_theme_for_activity_or_topic(theme: list[GenericField], activity1: str | None, activity2: str | None, topic1: str | None, topic2: str | None) list[Theme] ¶
Get theme identifier for activities and topics.
- Parameters:
theme – theme extracted from mapping
activity1 – activity 1 from the international-projects raw data file
activity2 – activity 2 from the international-projects raw data file
topic1 – topic 1 from the international-projects raw data file
topic2 – topic 2 from the international-projects raw data file
- Returns:
Sorted list of Theme
- mex.extractors.international_projects.transform.transform_international_projects_source_to_extracted_activity(source: InternationalProjectsSource, international_projects_activity: Any, extracted_primary_source: ExtractedPrimarySource, person_stable_target_ids_by_query_string: dict[Hashable, list[MergedPersonIdentifier]], unit_stable_target_id_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], funding_sources_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier], partner_organizations_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier]) ExtractedActivity | None ¶
Transform international projects source to extracted activity.
- Parameters:
source – international projects sources
international_projects_activity – activity mapping model with default values
extracted_primary_source – Extracted primary_source for international Projects
person_stable_target_ids_by_query_string – Mapping from author query to person stable target ID
unit_stable_target_id_by_synonym – Mapping from unit acronyms and labels to unit stable target ID
funding_sources_stable_target_id_by_query – Mapping from funding sources to organization stable target ID
partner_organizations_stable_target_id_by_query – Mapping from partner orgs to their stable target ID
- Returns:
ExtractedActivity or None if it was filtered out
- mex.extractors.international_projects.transform.transform_international_projects_sources_to_extracted_activities(international_projects_sources: Iterable[InternationalProjectsSource], international_projects_activity: Any, extracted_primary_source: ExtractedPrimarySource, person_stable_target_ids_by_query_string: dict[Hashable, list[MergedPersonIdentifier]], unit_stable_target_id_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], funding_sources_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier], partner_organizations_stable_target_id_by_query: dict[str, MergedOrganizationIdentifier]) Generator[ExtractedActivity, None, None] ¶
Transform international projects sources to extracted activity.
- Parameters:
international_projects_sources – international projects sources
international_projects_activity – activity mapping model with default values
extracted_primary_source – Extracted primary_source for FF Projects
person_stable_target_ids_by_query_string – Mapping from author query to person stable target ID
unit_stable_target_id_by_synonym – Mapping from unit acronyms and labels to unit stable target ID
funding_sources_stable_target_id_by_query – Mapping from funding sources to organization stable target ID
partner_organizations_stable_target_id_by_query – Mapping from partner orgs to their stable target ID
- Returns:
Generator for ExtractedActivity instances