mex.extractors.synopse package

Subpackages

Submodules

mex.extractors.synopse.connector module

class mex.extractors.synopse.connector.ReportServerConnector

Bases: HTTPConnector

Connector to handle authentication and requesting the Power BI Report Server.

_check_availability() None

Send a GET request to verify the host is available.

_set_authentication() None

Authenticate to the host.

_set_url() None

Set url of the host.

mex.extractors.synopse.extract module

mex.extractors.synopse.extract.extract_projects() Generator[SynopseProject, None, None]

Extract projects from projekt_und_studienverwaltung report.

Settings:
synopse.projekt_und_studienverwaltung_path: Path to the

projekt_und_studienverwaltung file, absolute or relative to assets_dir

Returns:

Generator for Synopse Projects

mex.extractors.synopse.extract.extract_study_data() Generator[SynopseStudy, None, None]

Extract study data from metadaten_zu_datensaetzen report.

Settings:
synopse.metadaten_zu_datensaetzen_path: Path to the metadaten_zu_datensaetzen

file, absolute or relative to assets_dir

Returns:

Generator for Synopse Studies

mex.extractors.synopse.extract.extract_study_overviews() Generator[SynopseStudyOverview, None, None]

Extract projects from datensatzuebersicht report.

Settings:
synopse.datensatzuebersicht_path: Path to the datensatzuebersicht file,

absolute or relative to assets_dir

Returns:

Generator for Synopse Overviews

mex.extractors.synopse.extract.extract_synopse_contact(access_platform_mapping: AccessPlatformMapping) list[LDAPFunctionalAccount]

Extract LDAP persons for Synopse project contact.

Parameters:

access_platform_mapping – Synopse access platform default values

Returns:

contact LDAP persons

mex.extractors.synopse.extract.extract_synopse_organizations(synopse_projects: list[SynopseProject]) dict[str, MergedOrganizationIdentifier]

Search and extract organization from wikidata.

Parameters:

synopse_projects – list of synopse projects

Returns:

Dict with organization label and WikidataOrganization

mex.extractors.synopse.extract.extract_synopse_project_contributors(synopse_projects: Iterable[SynopseProject]) Generator[LDAPPersonWithQuery, None, None]

Extract LDAP persons for Synopse project contributors.

Parameters:

synopse_projects – Synopse projects

Returns:

Generator for LDAP persons

mex.extractors.synopse.extract.extract_variables() list[SynopseVariable]

Extract variables from variablenuebersicht report.

Settings:
synopse.variablenuebersicht_path: Path to the variablenuebersicht file,

absolute or relative to assets_dir

Returns:

list for Synopse Variables

mex.extractors.synopse.filter module

mex.extractors.synopse.filter.filter_and_log_synopse_variables(synopse_variables: list[SynopseVariable]) list[SynopseVariable]

Filter out and log variables used for internal context.

Parameters:

synopse_variables – list of synopse variables

Returns:

list of filtered synopse variables

mex.extractors.synopse.main module

mex.extractors.synopse.settings module

class mex.extractors.synopse.settings.SynopseSettings(*, report_server_url: str = 'https://report-server/', report_server_username: SecretStr = SecretStr('**********'), report_server_password: SecretStr = SecretStr('**********'), variablenuebersicht_path: AssetsPath = AssetsPath('raw-data/synopse/variablenuebersicht.csv'), projekt_und_studienverwaltung_path: AssetsPath = AssetsPath('raw-data/synopse/projekt_und_studienverwaltung.csv'), metadaten_zu_datensaetzen_path: AssetsPath = AssetsPath('raw-data/synopse/metadaten_zu_datensaetzen.csv'), datensatzuebersicht_path: AssetsPath = AssetsPath('raw-data/synopse/datensatzuebersicht.csv'), mapping_path: AssetsPath = AssetsPath('mappings/synopse'))

Bases: BaseModel

Synopse settings submodel definition for the Synopse extractor.

datensatzuebersicht_path: AssetsPath
mapping_path: AssetsPath
metadaten_zu_datensaetzen_path: AssetsPath
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

projekt_und_studienverwaltung_path: AssetsPath
report_server_password: SecretStr
report_server_url: str
report_server_username: SecretStr
variablenuebersicht_path: AssetsPath

mex.extractors.synopse.transform module

mex.extractors.synopse.transform.transform_overviews_to_resource_lookup(study_overviews: list[SynopseStudyOverview], study_resources: list[ExtractedResource]) dict[str, ExtractedResource]

Transform overviews and resources into an identifier in primary source lookup.

Parameters:
  • study_overviews – list of Synopse Overviews

  • study_resources – list of Study Resources

Returns:

Map from synopse variable ID to list of resource stable target IDs

mex.extractors.synopse.transform.transform_synopse_data_to_mex_resources(synopse_studies: Iterable[SynopseStudy], synopse_projects: Iterable[SynopseProject], synopse_variables_by_study_id: dict[int, list[SynopseVariable]], extracted_activities: Iterable[ExtractedActivity], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_organization: ExtractedOrganization, synopse_resource: ResourceMapping, synopse_access_platform_id: MergedAccessPlatformIdentifier, synopse_merged_person_ids_by_str: dict[str, list[MergedPersonIdentifier]]) list[ExtractedResource]

Transform Synopse Studies to MEx resources.

Parameters:
  • synopse_studies – Iterable of Synopse Studies

  • synopse_projects – Iterable of synopse projects

  • synopse_variables_by_study_id – mapping from synopse studie id to the variables with this studie id

  • extracted_activities – Iterable of extracted activities

  • unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID

  • extracted_organization – extracted organization

  • synopse_resource – resource default values

  • synopse_merged_contact_point_ids_by_query_string – contact person lookup by email

  • synopse_access_platform_id – synopse access platform id

  • synopse_merged_person_ids_by_str – person ids by name

Returns:

list for extracted resources

mex.extractors.synopse.transform.transform_synopse_project_to_activity(synopse_project: SynopseProject, contributor_merged_ids_by_name: dict[str, list[MergedPersonIdentifier]], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], synopse_activity: ActivityMapping, synopse_merged_organization_ids_by_query_string: dict[str, MergedOrganizationIdentifier]) ExtractedActivity | None

Transform a synopse project into a MEx activity.

Parameters:
  • synopse_project – a synopse project

  • contact_merged_ids_by_emails – Mapping from LDAP emails to contact IDs

  • contributor_merged_ids_by_name – Mapping from person names to contributor IDs

  • unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID

  • synopse_activity – synopse activity default values

  • synopse_merged_organization_ids_by_query_string – merged organization ids by org name

Returns:

extracted activity

mex.extractors.synopse.transform.transform_synopse_projects_to_mex_activities(synopse_projects: Iterable[SynopseProject], contributor_merged_ids_by_name: dict[str, list[MergedPersonIdentifier]], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], synopse_activity: ActivityMapping, synopse_merged_organization_ids_by_query_string: dict[str, MergedOrganizationIdentifier]) tuple[list[ExtractedActivity], list[ExtractedActivity]]

Transform synopse projects into MEx activities.

Parameters:
  • synopse_projects – Iterable of synopse projects

  • contact_merged_ids_by_emails – Mapping from LDAP emails to contact IDs

  • contributor_merged_ids_by_name – Mapping from person names to contributor IDs

  • unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID

  • synopse_activity – synopse activity default values

  • synopse_merged_organization_ids_by_query_string – merged organization ids by org name

Returns:

tuple of non-child and child extracted activities

mex.extractors.synopse.transform.transform_synopse_studies_into_access_platforms(unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], synopse_merged_contact_point_ids_by_query_string: dict[str, MergedContactPointIdentifier], access_platform_mapping: AccessPlatformMapping) ExtractedAccessPlatform

Transform synopse studies into access platforms.

Parameters:
  • unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID

  • synopse_merged_contact_point_ids_by_query_string – contact person lookup by email

  • access_platform_mapping – mapping default values for access platform

Returns:

extracted access platform

mex.extractors.synopse.transform.transform_synopse_variables_belonging_to_same_variable_group_to_mex_variables(variables: Iterable[SynopseVariable], synopse_variable_groups_by_identifier_in_primary_source: dict[str, ExtractedVariableGroup], synopse_extracted_resources_by_identifier_in_primary_source: dict[str, ExtractedResource], study_overviews: list[SynopseStudyOverview]) Generator[ExtractedVariable, None, None]

Transform Synopse variables to extracted variables.

Parameters:
  • variables – Iterable of Synopse Variables

  • synopse_variable_groups_by_identifier_in_primary_source – extracted variable groups by identifier in primary source

  • synopse_extracted_resources_by_identifier_in_primary_source – Map from synopse ID to study resource

  • study_overviews – list of synopse study overviews

Returns:

Generator for ExtractedVariable

mex.extractors.synopse.transform.transform_synopse_variables_to_mex_variable_groups(synopse_variables_by_thema: dict[str, list[SynopseVariable]], synopse_extracted_resources_by_identifier_in_primary_source: dict[str, ExtractedResource], study_overviews: list[SynopseStudyOverview]) list[ExtractedVariableGroup]

Transform Synopse Variable Sets to MEx Variable Groups.

Parameters:
  • synopse_variables_by_thema – mapping from “Thema und Fragebogenausschnitt” to the variables having this value

  • synopse_extracted_resources_by_identifier_in_primary_source – Map from synopse ID to list of study resources

  • study_overviews – list of Synopse Overviews

Returns:

list of extracted variable groups

mex.extractors.synopse.transform.transform_synopse_variables_to_mex_variables(synopse_variables_by_thema: dict[str, list[SynopseVariable]], synopse_variable_groups_by_identifier_in_primary_source: dict[str, ExtractedVariableGroup], synopse_extracted_resources_by_identifier_in_primary_source: dict[str, ExtractedResource], study_overviews: list[SynopseStudyOverview]) Generator[ExtractedVariable, None, None]

Transform Synopse Variable Sets to MEx datums.

Parameters:
  • synopse_variables_by_thema – mapping from “Thema und Fragebogenausschnitt” to the variables having this value

  • synopse_variable_groups_by_identifier_in_primary_source – extracted variable groups by identifier in primary source

  • synopse_extracted_resources_by_identifier_in_primary_source – Map from identifier in primary source to study resource

  • study_overviews – list of synopse study overviews

Returns:

Generator for ExtractedVariable

Module contents