mex.extractors.synopse package¶
Subpackages¶
- mex.extractors.synopse.models package
- Submodules
- mex.extractors.synopse.models.project module
SynopseProjectSynopseProject.akronym_des_studientitelsSynopseProject.anschlussprojektSynopseProject.beitragendeSynopseProject.beschreibung_der_studieSynopseProject.externe_partnerSynopseProject.foerderinstitution_oder_auftraggeberSynopseProject.get_end_year()SynopseProject.get_identifier_in_primary_source()SynopseProject.get_partners()SynopseProject.get_start_year()SynopseProject.get_units()SynopseProject.interne_partnerSynopseProject.model_configSynopseProject.project_studientitelSynopseProject.projektbeginnSynopseProject.projektdokumentationSynopseProject.projektendeSynopseProject.studien_idSynopseProject.studienart_studientypSynopseProject.verantwortliche_oe
- mex.extractors.synopse.models.study module
SynopseStudySynopseStudy.beschreibungSynopseStudy.bevoelkerungsabdeckungSynopseStudy.dateiformatSynopseStudy.datum_der_letzten_aenderungSynopseStudy.dokumentationSynopseStudy.ds_typ_idSynopseStudy.erstellungs_datumSynopseStudy.feld_endeSynopseStudy.feld_startSynopseStudy.herkunft_der_datenSynopseStudy.lizenzSynopseStudy.model_configSynopseStudy.plattformSynopseStudy.plattform_adresseSynopseStudy.raeumlicher_bezugSynopseStudy.rechteSynopseStudy.schlagworte_themenSynopseStudy.studieSynopseStudy.studien_idSynopseStudy.titel_datensetSynopseStudy.typisches_alter_maxSynopseStudy.typisches_alter_minSynopseStudy.versionSynopseStudy.zugangsbeschraenkungSynopseStudy.zweck
- mex.extractors.synopse.models.study_overview module
- mex.extractors.synopse.models.variable module
SynopseVariableSynopseVariable.auspraegungenSynopseVariable.datentypSynopseVariable.int_varSynopseVariable.keep_varnameSynopseVariable.model_configSynopseVariable.originalfrageSynopseVariable.studieSynopseVariable.studie_idSynopseVariable.synopse_idSynopseVariable.text_dtSynopseVariable.thema_und_fragebogenausschnittSynopseVariable.unterthemaSynopseVariable.val_instrumentSynopseVariable.varlabelSynopseVariable.varname
- Module contents
Submodules¶
mex.extractors.synopse.connector module¶
- class mex.extractors.synopse.connector.ReportServerConnector¶
Bases:
HTTPConnectorConnector to handle authentication and requesting the Power BI Report Server.
- _check_availability() None¶
Send a GET request to verify the host is available.
- _set_authentication() None¶
Authenticate to the host.
- _set_url() None¶
Set url of the host.
mex.extractors.synopse.extract module¶
- mex.extractors.synopse.extract.extract_projects() Generator[SynopseProject, None, None]¶
Extract projects from projekt_und_studienverwaltung report.
- Settings:
- synopse.projekt_und_studienverwaltung_path: Path to the
projekt_und_studienverwaltung file, absolute or relative to assets_dir
- Returns:
Generator for Synopse Projects
- mex.extractors.synopse.extract.extract_study_data() Generator[SynopseStudy, None, None]¶
Extract study data from metadaten_zu_datensaetzen report.
- Settings:
- synopse.metadaten_zu_datensaetzen_path: Path to the metadaten_zu_datensaetzen
file, absolute or relative to assets_dir
- Returns:
Generator for Synopse Studies
- mex.extractors.synopse.extract.extract_study_overviews() Generator[SynopseStudyOverview, None, None]¶
Extract projects from datensatzuebersicht report.
- Settings:
- synopse.datensatzuebersicht_path: Path to the datensatzuebersicht file,
absolute or relative to assets_dir
- Returns:
Generator for Synopse Overviews
- mex.extractors.synopse.extract.extract_synopse_contact(access_platform_mapping: AccessPlatformMapping) list[LDAPFunctionalAccount]¶
Extract LDAP persons for Synopse project contact.
- Parameters:
access_platform_mapping – Synopse access platform default values
- Returns:
contact LDAP persons
- mex.extractors.synopse.extract.extract_synopse_organizations(synopse_projects: list[SynopseProject]) dict[str, MergedOrganizationIdentifier]¶
Search and extract organization from wikidata.
- Parameters:
synopse_projects – list of synopse projects
- Returns:
Dict with organization label and WikidataOrganization
- mex.extractors.synopse.extract.extract_synopse_project_contributors(synopse_projects: Iterable[SynopseProject]) Generator[LDAPPersonWithQuery, None, None]¶
Extract LDAP persons for Synopse project contributors.
- Parameters:
synopse_projects – Synopse projects
- Returns:
Generator for LDAP persons
- mex.extractors.synopse.extract.extract_variables() list[SynopseVariable]¶
Extract variables from variablenuebersicht report.
- Settings:
- synopse.variablenuebersicht_path: Path to the variablenuebersicht file,
absolute or relative to assets_dir
- Returns:
list for Synopse Variables
mex.extractors.synopse.filter module¶
- mex.extractors.synopse.filter.filter_and_log_synopse_variables(synopse_variables: list[SynopseVariable]) list[SynopseVariable]¶
Filter out and log variables used for internal context.
- Parameters:
synopse_variables – list of synopse variables
- Returns:
list of filtered synopse variables
mex.extractors.synopse.main module¶
mex.extractors.synopse.settings module¶
- class mex.extractors.synopse.settings.SynopseSettings(*, report_server_url: str = 'https://report-server/', report_server_username: SecretStr = SecretStr('**********'), report_server_password: SecretStr = SecretStr('**********'), variablenuebersicht_path: AssetsPath = AssetsPath('raw-data/synopse/variablenuebersicht.csv'), projekt_und_studienverwaltung_path: AssetsPath = AssetsPath('raw-data/synopse/projekt_und_studienverwaltung.csv'), metadaten_zu_datensaetzen_path: AssetsPath = AssetsPath('raw-data/synopse/metadaten_zu_datensaetzen.csv'), datensatzuebersicht_path: AssetsPath = AssetsPath('raw-data/synopse/datensatzuebersicht.csv'), mapping_path: AssetsPath = AssetsPath('mappings/synopse'))¶
Bases:
BaseModelSynopse settings submodel definition for the Synopse extractor.
- datensatzuebersicht_path: AssetsPath¶
- mapping_path: AssetsPath¶
- metadaten_zu_datensaetzen_path: AssetsPath¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- projekt_und_studienverwaltung_path: AssetsPath¶
- report_server_password: SecretStr¶
- report_server_url: str¶
- report_server_username: SecretStr¶
- variablenuebersicht_path: AssetsPath¶
mex.extractors.synopse.transform module¶
- mex.extractors.synopse.transform.transform_overviews_to_resource_lookup(study_overviews: list[SynopseStudyOverview], study_resources: list[ExtractedResource]) dict[str, ExtractedResource]¶
Transform overviews and resources into an identifier in primary source lookup.
- Parameters:
study_overviews – list of Synopse Overviews
study_resources – list of Study Resources
- Returns:
Map from synopse variable ID to list of resource stable target IDs
- mex.extractors.synopse.transform.transform_synopse_data_to_mex_resources(synopse_studies: Iterable[SynopseStudy], synopse_projects: Iterable[SynopseProject], synopse_variables_by_study_id: dict[int, list[SynopseVariable]], extracted_activities: Iterable[ExtractedActivity], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_organization: ExtractedOrganization, synopse_resource: ResourceMapping, synopse_access_platform_id: MergedAccessPlatformIdentifier, synopse_merged_person_ids_by_str: dict[str, list[MergedPersonIdentifier]]) list[ExtractedResource]¶
Transform Synopse Studies to MEx resources.
- Parameters:
synopse_studies – Iterable of Synopse Studies
synopse_projects – Iterable of synopse projects
synopse_variables_by_study_id – mapping from synopse studie id to the variables with this studie id
extracted_activities – Iterable of extracted activities
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
extracted_organization – extracted organization
synopse_resource – resource default values
synopse_merged_contact_point_ids_by_query_string – contact person lookup by email
synopse_access_platform_id – synopse access platform id
synopse_merged_person_ids_by_str – person ids by name
- Returns:
list for extracted resources
- mex.extractors.synopse.transform.transform_synopse_project_to_activity(synopse_project: SynopseProject, contributor_merged_ids_by_name: dict[str, list[MergedPersonIdentifier]], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], synopse_activity: ActivityMapping, synopse_merged_organization_ids_by_query_string: dict[str, MergedOrganizationIdentifier]) ExtractedActivity | None¶
Transform a synopse project into a MEx activity.
- Parameters:
synopse_project – a synopse project
contact_merged_ids_by_emails – Mapping from LDAP emails to contact IDs
contributor_merged_ids_by_name – Mapping from person names to contributor IDs
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
synopse_activity – synopse activity default values
synopse_merged_organization_ids_by_query_string – merged organization ids by org name
- Returns:
extracted activity
- mex.extractors.synopse.transform.transform_synopse_projects_to_mex_activities(synopse_projects: Iterable[SynopseProject], contributor_merged_ids_by_name: dict[str, list[MergedPersonIdentifier]], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], synopse_activity: ActivityMapping, synopse_merged_organization_ids_by_query_string: dict[str, MergedOrganizationIdentifier]) tuple[list[ExtractedActivity], list[ExtractedActivity]]¶
Transform synopse projects into MEx activities.
- Parameters:
synopse_projects – Iterable of synopse projects
contact_merged_ids_by_emails – Mapping from LDAP emails to contact IDs
contributor_merged_ids_by_name – Mapping from person names to contributor IDs
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
synopse_activity – synopse activity default values
synopse_merged_organization_ids_by_query_string – merged organization ids by org name
- Returns:
tuple of non-child and child extracted activities
- mex.extractors.synopse.transform.transform_synopse_studies_into_access_platforms(unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], synopse_merged_contact_point_ids_by_query_string: dict[str, MergedContactPointIdentifier], access_platform_mapping: AccessPlatformMapping) ExtractedAccessPlatform¶
Transform synopse studies into access platforms.
- Parameters:
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
synopse_merged_contact_point_ids_by_query_string – contact person lookup by email
access_platform_mapping – mapping default values for access platform
- Returns:
extracted access platform
- mex.extractors.synopse.transform.transform_synopse_variables_belonging_to_same_variable_group_to_mex_variables(variables: Iterable[SynopseVariable], synopse_variable_groups_by_identifier_in_primary_source: dict[str, ExtractedVariableGroup], synopse_extracted_resources_by_identifier_in_primary_source: dict[str, ExtractedResource], study_overviews: list[SynopseStudyOverview]) Generator[ExtractedVariable, None, None]¶
Transform Synopse variables to extracted variables.
- Parameters:
variables – Iterable of Synopse Variables
synopse_variable_groups_by_identifier_in_primary_source – extracted variable groups by identifier in primary source
synopse_extracted_resources_by_identifier_in_primary_source – Map from synopse ID to study resource
study_overviews – list of synopse study overviews
- Returns:
Generator for ExtractedVariable
- mex.extractors.synopse.transform.transform_synopse_variables_to_mex_variable_groups(synopse_variables_by_thema: dict[str, list[SynopseVariable]], synopse_extracted_resources_by_identifier_in_primary_source: dict[str, ExtractedResource], study_overviews: list[SynopseStudyOverview]) list[ExtractedVariableGroup]¶
Transform Synopse Variable Sets to MEx Variable Groups.
- Parameters:
synopse_variables_by_thema – mapping from “Thema und Fragebogenausschnitt” to the variables having this value
synopse_extracted_resources_by_identifier_in_primary_source – Map from synopse ID to list of study resources
study_overviews – list of Synopse Overviews
- Returns:
list of extracted variable groups
- mex.extractors.synopse.transform.transform_synopse_variables_to_mex_variables(synopse_variables_by_thema: dict[str, list[SynopseVariable]], synopse_variable_groups_by_identifier_in_primary_source: dict[str, ExtractedVariableGroup], synopse_extracted_resources_by_identifier_in_primary_source: dict[str, ExtractedResource], study_overviews: list[SynopseStudyOverview]) Generator[ExtractedVariable, None, None]¶
Transform Synopse Variable Sets to MEx datums.
- Parameters:
synopse_variables_by_thema – mapping from “Thema und Fragebogenausschnitt” to the variables having this value
synopse_variable_groups_by_identifier_in_primary_source – extracted variable groups by identifier in primary source
synopse_extracted_resources_by_identifier_in_primary_source – Map from identifier in primary source to study resource
study_overviews – list of synopse study overviews
- Returns:
Generator for ExtractedVariable