mex.extractors.ff_projects package¶
Subpackages¶
- mex.extractors.ff_projects.models package
- Submodules
- mex.extractors.ff_projects.models.source module
FFProjectsSourceFFProjectsSource.foerderprogrFFProjectsSource.get_end_year()FFProjectsSource.get_identifier_in_primary_source()FFProjectsSource.get_partners()FFProjectsSource.get_start_year()FFProjectsSource.get_units()FFProjectsSource.kategorieFFProjectsSource.laufzeit_bisFFProjectsSource.laufzeit_cellsFFProjectsSource.laufzeit_vonFFProjectsSource.lfd_nrFFProjectsSource.model_configFFProjectsSource.projektleiterFFProjectsSource.rki_azFFProjectsSource.rki_oeFFProjectsSource.thema_des_projektsFFProjectsSource.zuwendungs_oder_auftraggeber
- Module contents
Submodules¶
mex.extractors.ff_projects.extract module¶
- mex.extractors.ff_projects.extract.extract_ff_project_authors(ff_projects_sources: Iterable[FFProjectsSource]) list[LDAPPersonWithQuery]¶
Extract LDAP persons with their query string for FF Projects authors.
- Parameters:
ff_projects_sources – FF Projects sources
- Returns:
List of LDAP persons with query
- mex.extractors.ff_projects.extract.extract_ff_projects_organizations(ff_projects_sources: Iterable[FFProjectsSource]) dict[str, MergedOrganizationIdentifier]¶
Search and extract organization from wikidata.
- Parameters:
ff_projects_sources – Iterable of ff-project sources
- Returns:
Dict with organization label and WikidataOrganization ID
- mex.extractors.ff_projects.extract.extract_ff_projects_source(row: pd.Series[Any]) FFProjectsSource | None¶
Extract one FF Projects source from a single pandas series row.
- Parameters:
row – pandas df series row representing one source
- Returns:
FF Projects source
- mex.extractors.ff_projects.extract.extract_ff_projects_sources() list[FFProjectsSource]¶
Extract FF Projects sources by loading data from MS-Excel file.
- Settings:
- ff_projects.file_path: Path to the ff-projects list, absolute or relative to
assets_dir
- Returns:
List of FF Projects sources
- mex.extractors.ff_projects.extract.get_clean_names(name: str) str¶
Clean name from unwanted characters and numerals.
- Parameters:
name – Name of the person
- Returns:
Cleaned Name
- Return type:
str
- mex.extractors.ff_projects.extract.get_optional_string_from_cell(cell_value: Any) str | None¶
Try to extract the string value from a cell by truncating floats.
- Parameters:
cell_value – Value of a cell, could be string, int or datetime
- Returns:
String or None
- mex.extractors.ff_projects.extract.get_string_from_cell(cell_value: Any) str¶
Try to extract the string value from a cell by truncating floats.
- Parameters:
cell_value – Value of a cell, could be string, int or datetime
- Returns:
String
- mex.extractors.ff_projects.extract.get_temporal_entity_from_cell(cell_value: Any) TemporalEntity | None¶
Try to extract a temporal_entity from a cell.
- Parameters:
cell_value – Value of a cell, could be int, string or datetime
- Returns:
TemporalEntity or None
mex.extractors.ff_projects.filter module¶
- mex.extractors.ff_projects.filter.filter_and_log_ff_projects_source(source: FFProjectsSource, unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]]) bool¶
Filter a FFprojectSource according to settings and log filtering.
- Parameters:
source – FFProjectSource
unit_stable_target_ids_by_synonym – Unit IDs grouped by synonyms
- Settings:
ff_projects.skip_funding: Skip sources with this funding ff_projects.skip_topics: Skip sources with these topics ff_projects.skip_years_strings: Skip sources with these years ff_projects.skip_clients: Skip sources with these clients
- Returns:
False if source is filtered out, else True
- mex.extractors.ff_projects.filter.filter_and_log_ff_projects_sources(sources: Iterable[FFProjectsSource], unit_stable_target_ids_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]]) list[FFProjectsSource]¶
Filter FF Projects sources and log filtered sources.
- Parameters:
sources – Iterable of FFProjectSources
unit_stable_target_ids_by_synonym – Unit IDs grouped by synonyms
- Returns:
List of filtered FF Projects sources
- mex.extractors.ff_projects.filter.filter_out_duplicate_source_ids(sources: Collection[FFProjectsSource]) list[FFProjectsSource]¶
Remove duplicate `lfd_nr`s from the given sources.
- Parameters:
sources – Collection of FF Projects sources
- Returns:
Filtered FF Projects sources
mex.extractors.ff_projects.main module¶
mex.extractors.ff_projects.settings module¶
- class mex.extractors.ff_projects.settings.FFProjectsSettings(*, file_path: AssetsPath = AssetsPath('raw-data/ff-projects/ff-projects.xlsx'), skip_funding: list[str] = ['Sonstige'], skip_topics: list[str] = ['Sonstige'], skip_years_strings: list[str] = ['fehlt', 'keine', 'offen'], skip_clients: list[str] = ['Sonstige'], mapping_path: AssetsPath = AssetsPath('mappings/ff-projects'))¶
Bases:
BaseModelSettings submodel for the FF Projects extractor.
- file_path: AssetsPath¶
- mapping_path: AssetsPath¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- skip_clients: list[str]¶
- skip_funding: list[str]¶
- skip_topics: list[str]¶
- skip_years_strings: list[str]¶
mex.extractors.ff_projects.transform module¶
- mex.extractors.ff_projects.transform.transform_ff_projects_source_to_extracted_activity(ff_projects_source: FFProjectsSource, person_stable_target_ids_by_query_string: dict[str, list[MergedPersonIdentifier]], unit_stable_target_id_by_synonym: dict[str, list[MergedOrganizationalUnitIdentifier]], organization_stable_target_id_by_synonyms: dict[str, MergedOrganizationIdentifier], ff_projects_activity: ActivityMapping) ExtractedActivity¶
Transform FF Projects source to an extracted activity.
- Parameters:
ff_projects_source – FF Projects source
person_stable_target_ids_by_query_string – Mapping from author query to person stable target ID
unit_stable_target_id_by_synonym – Mapping from unit acronyms and labels to unit stable target ID
organization_stable_target_id_by_synonyms – Mapping from organization synonyms to organization stable target ID
ff_projects_activity – activity mapping model with default values
- Returns:
Extracted activity for the given projects source