mex.extractors.synopse package¶
Subpackages¶
- mex.extractors.synopse.models package
- Submodules
- mex.extractors.synopse.models.project module
SynopseProject
SynopseProject.akronym_des_studientitels
SynopseProject.anschlussprojekt
SynopseProject.beitragende
SynopseProject.beschreibung_der_studie
SynopseProject.externe_partner
SynopseProject.foerderinstitution_oder_auftraggeber
SynopseProject.get_contacts()
SynopseProject.interne_partner
SynopseProject.kontakt
SynopseProject.model_computed_fields
SynopseProject.model_config
SynopseProject.model_fields
SynopseProject.project_studientitel
SynopseProject.projektbeginn
SynopseProject.projektdokumentation
SynopseProject.projektende
SynopseProject.studien_id
SynopseProject.studienart_studientyp
SynopseProject.verantwortliche_oe
- mex.extractors.synopse.models.study module
SynopseStudy
SynopseStudy.beschreibung
SynopseStudy.dateiformat
SynopseStudy.dokumentation
SynopseStudy.ds_typ_id
SynopseStudy.erstellungs_datum
SynopseStudy.lizenz
SynopseStudy.model_computed_fields
SynopseStudy.model_config
SynopseStudy.model_fields
SynopseStudy.plattform
SynopseStudy.plattform_adresse
SynopseStudy.rechte
SynopseStudy.schlagworte_themen
SynopseStudy.studie
SynopseStudy.studien_id
SynopseStudy.titel_datenset
SynopseStudy.version
SynopseStudy.zugangsbeschraenkung
- mex.extractors.synopse.models.study_overview module
- mex.extractors.synopse.models.variable module
SynopseVariable
SynopseVariable.auspraegungen
SynopseVariable.datentyp
SynopseVariable.int_var
SynopseVariable.keep_varname
SynopseVariable.model_computed_fields
SynopseVariable.model_config
SynopseVariable.model_fields
SynopseVariable.originalfrage
SynopseVariable.studie
SynopseVariable.studie_id
SynopseVariable.synopse_id
SynopseVariable.text_dt
SynopseVariable.thema_und_fragebogenausschnitt
SynopseVariable.unterthema
SynopseVariable.val_instrument
SynopseVariable.varlabel
SynopseVariable.varname
- Module contents
Submodules¶
mex.extractors.synopse.connector module¶
- class mex.extractors.synopse.connector.ReportServerConnector¶
Bases:
HTTPConnector
Connector to handle authentication and requesting the Power BI Report Server.
- _check_availability() None ¶
Send a GET request to verify the host is available.
- _set_authentication() None ¶
Authenticate to the host.
- _set_url() None ¶
Set url of the host.
mex.extractors.synopse.extract module¶
- mex.extractors.synopse.extract.extract_projects() Generator[SynopseProject, None, None] ¶
Extract projects from projekt_und_studienverwaltung report.
- Settings:
- synopse.projekt_und_studienverwaltung_path: Path to the
projekt_und_studienverwaltung file, absolute or relative to assets_dir
- Returns:
Generator for Synopse Projects
- mex.extractors.synopse.extract.extract_study_data() Generator[SynopseStudy, None, None] ¶
Extract study data from metadaten_zu_datensaetzen report.
- Settings:
- synopse.metadaten_zu_datensaetzen_path: Path to the metadaten_zu_datensaetzen
file, absolute or relative to assets_dir
- Returns:
Generator for Synopse Studies
- mex.extractors.synopse.extract.extract_study_overviews() Generator[SynopseStudyOverview, None, None] ¶
Extract projects from datensatzuebersicht report.
- Settings:
- synopse.datensatzuebersicht_path: Path to the datensatzuebersicht file,
absolute or relative to assets_dir
- Returns:
Generator for Synopse Overviews
- mex.extractors.synopse.extract.extract_synopse_organizations(synopse_projects: list[SynopseProject]) dict[str, MergedOrganizationIdentifier] ¶
Search and extract organization from wikidata.
- Parameters:
synopse_projects – list of synopse projects
- Returns:
Dict with organization label and WikidataOrganization
- mex.extractors.synopse.extract.extract_synopse_project_contributors(synopse_projects: Iterable[SynopseProject]) Generator[LDAPPersonWithQuery, None, None] ¶
Extract LDAP persons for Synopse project contributors.
- Parameters:
synopse_projects – Synopse projects
- Returns:
Generator for LDAP persons
- mex.extractors.synopse.extract.extract_variables() Generator[SynopseVariable, None, None] ¶
Extract variables from variablenuebersicht report.
- Settings:
- synopse.variablenuebersicht_path: Path to the variablenuebersicht file,
absolute or relative to assets_dir
- Returns:
Generator for Synopse Variables
mex.extractors.synopse.filter module¶
- mex.extractors.synopse.filter.filter_and_log_access_platforms(synopse_studies: Iterable[SynopseStudy], extracted_primary_source: ExtractedPrimarySource) Generator[SynopseStudy, None, None] ¶
Filter out and log studies that cannot be accessed via an internal network drive.
- Parameters:
synopse_studies – iterable of synopse studies
extracted_primary_source – primary source for report server platform
- Returns:
Generator for filtered synopse studies
mex.extractors.synopse.main module¶
mex.extractors.synopse.settings module¶
- class mex.extractors.synopse.settings.SynopseSettings(*, report_server_url: str = 'https://report-server/', report_server_username: SecretStr = SecretStr('**********'), report_server_password: SecretStr = SecretStr('**********'), variablenuebersicht_path: AssetsPath = AssetsPath('raw-data/synopse/variablenuebersicht.csv'), projekt_und_studienverwaltung_path: AssetsPath = AssetsPath('raw-data/synopse/projekt_und_studienverwaltung.csv'), metadaten_zu_datensaetzen_path: AssetsPath = AssetsPath('raw-data/synopse/metadaten_zu_datensaetzen.csv'), datensatzuebersicht_path: AssetsPath = AssetsPath('raw-data/synopse/datensatzuebersicht.csv'), mapping_path: AssetsPath = AssetsPath('mappings/__final__/synopse'))¶
Bases:
BaseModel
Synopse settings submodel definition for the Synopse extractor.
- datensatzuebersicht_path: AssetsPath¶
- mapping_path: AssetsPath¶
- metadaten_zu_datensaetzen_path: AssetsPath¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'datensatzuebersicht_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/synopse/datensatzuebersicht.csv"), description='Path of the export in CSV format, absolute or relative to `asset_dir`'), 'mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__final__/synopse"), description='Path to the directory with the synopse mapping filesvalues, absolute path or relative to `assets_dir`.'), 'metadaten_zu_datensaetzen_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/synopse/metadaten_zu_datensaetzen.csv"), description='Path of the export in CSV format, absolute or relative to `asset_dir`'), 'projekt_und_studienverwaltung_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/synopse/projekt_und_studienverwaltung.csv"), description='Path of the export in CSV format, absolute or relative to `asset_dir`'), 'report_server_password': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Report Server password'), 'report_server_url': FieldInfo(annotation=str, required=False, default='https://report-server/', description='Report Server instance URL'), 'report_server_username': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Report Server user name'), 'variablenuebersicht_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/synopse/variablenuebersicht.csv"), description='Path of the export in CSV format, absolute or relative to `asset_dir`')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- projekt_und_studienverwaltung_path: AssetsPath¶
- report_server_password: SecretStr¶
- report_server_url: str¶
- report_server_username: SecretStr¶
- variablenuebersicht_path: AssetsPath¶
mex.extractors.synopse.transform module¶
- mex.extractors.synopse.transform.split_off_extended_data_use_variables(synopse_variables: Iterable[SynopseVariable], synopse_overviews: Iterable[SynopseStudyOverview]) tuple[Generator[SynopseVariable, None, None], Generator[SynopseVariable, None, None]] ¶
Split extended data use variables from regular variables in datensatzuebersicht.
- Parameters:
synopse_variables – iterable of synopse variables
synopse_overviews – iterable of synopse overviews
- Returns:
Tuple of two Generators for synopse variables
- mex.extractors.synopse.transform.transform_overviews_to_resource_lookup(study_overviews: Iterable[SynopseStudyOverview], study_resources: Iterable[ExtractedResource]) dict[str, list[MergedResourceIdentifier]] ¶
Transform overviews and resources into a resource ID lookup.
- Parameters:
study_overviews – Iterable of Synopse Overviews
study_resources – Iterable of Study Resources
- Returns:
Map from synopse variable ID to list of resource stable target IDs
- mex.extractors.synopse.transform.transform_synopse_data_extended_data_use_to_mex_resources(synopse_studies: Iterable[SynopseStudy], synopse_projects: Iterable[SynopseProject], synopse_variables_by_study_id: dict[int, list[SynopseVariable]], extracted_activities: Iterable[ExtractedActivity], extracted_access_platforms: Iterable[ExtractedAccessPlatform], extracted_primary_source: ExtractedPrimarySource, unit_merged_ids_by_synonym: dict[str, Identifier], extracted_organization: ExtractedOrganization, synopse_resource: Any) Generator[ExtractedResource, None, None] ¶
Transform Synopse Studies to MEx resources.
- Parameters:
synopse_studies – Iterable of Synopse Studies
synopse_projects – Iterable of synopse projects
synopse_variables_by_study_id – mapping from synopse studie id to the variables with this studie id
extracted_activities – Iterable of extracted activities
extracted_access_platforms – Iterable of extracted access platforms
extracted_primary_source – Extracted report server platform
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
extracted_organization – extracted organization
synopse_resource – resource default values
- Returns:
Generator for extracted resources
- mex.extractors.synopse.transform.transform_synopse_data_regular_to_mex_resources(synopse_studies: Iterable[SynopseStudy], synopse_projects: Iterable[SynopseProject], synopse_variables_by_study_id: dict[int, list[SynopseVariable]], extracted_activities: Iterable[ExtractedActivity], extracted_access_platforms: Iterable[ExtractedAccessPlatform], extracted_primary_source: ExtractedPrimarySource, unit_merged_ids_by_synonym: dict[str, Identifier], extracted_organization: ExtractedOrganization, synopse_resource: Any) Generator[ExtractedResource, None, None] ¶
Transform Synopse Studies to MEx resources.
- Parameters:
synopse_studies – Iterable of Synopse Studies
synopse_projects – Iterable of synopse projects
synopse_variables_by_study_id – mapping from synopse studie id to the variables with this studie id
extracted_activities – Iterable of extracted activities
extracted_access_platforms – Iterable of extracted access platforms
extracted_primary_source – Extracted report server platform
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
extracted_organization – extracted organization
synopse_resource – resource extended data use default values
- Returns:
Generator for extracted resources
- mex.extractors.synopse.transform.transform_synopse_data_to_mex_resources(synopse_studies: Iterable[SynopseStudy], synopse_projects: Iterable[SynopseProject], extracted_activities: Iterable[ExtractedActivity], extracted_access_platforms: Iterable[ExtractedAccessPlatform], extracted_primary_source: ExtractedPrimarySource, unit_merged_ids_by_synonym: dict[str, Identifier], extracted_organization: ExtractedOrganization, created_by_study_id: dict[str, str] | None, description_by_study_id: dict[str, str] | None, documentation_by_study_id: dict[str, Link] | None, keyword_text_by_study_id: dict[str, list[Text]], synopse_resource: Any, identifier_in_primary_source_by_study_id: dict[str, str], title_by_study_id: dict[str, Text]) Generator[ExtractedResource, None, None] ¶
Transform Synopse Studies to MEx resources.
- Parameters:
synopse_studies – Iterable of Synopse Studies
synopse_projects – Iterable of synopse projects
extracted_activities – Iterable of extracted activities
extracted_access_platforms – Iterable of extracted access platforms
extracted_primary_source – Extracted report server platform
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
extracted_organization – extracted organization
created_by_study_id – Creation timestamps by study ID
description_by_study_id – Description Text by study ID
documentation_by_study_id – Documentation Link by study ID
keyword_text_by_study_id – List of keywords by study ID
synopse_resource – resource mapping model with default values
identifier_in_primary_source_by_study_id – identifierInPrimarySource by study ID
title_by_study_id – title by study ID
- Returns:
Generator for extracted resources
- mex.extractors.synopse.transform.transform_synopse_project_to_activity(synopse_project: SynopseProject, extracted_primary_source: ExtractedPrimarySource, contact_merged_ids_by_emails: dict[str, Identifier], contributor_merged_ids_by_name: dict[Hashable, list[Identifier]], unit_merged_ids_by_synonym: dict[str, Identifier], synopse_activity: Any, synopse_organization_ids_by_query_string: dict[str, MergedOrganizationIdentifier]) ExtractedActivity ¶
Transform a synopse project into a MEx activity.
- Parameters:
synopse_project – a synopse project
extracted_primary_source – Extracted report server primary sources
contact_merged_ids_by_emails – Mapping from LDAP emails to contact IDs
contributor_merged_ids_by_name – Mapping from person names to contributor IDs
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
synopse_activity – synopse activity default values
synopse_organization_ids_by_query_string – merged organization ids by org name
- Returns:
extracted activity
- mex.extractors.synopse.transform.transform_synopse_projects_to_mex_activities(synopse_projects: Iterable[SynopseProject], extracted_primary_source: ExtractedPrimarySource, contact_merged_ids_by_emails: dict[str, Identifier], contributor_merged_ids_by_name: dict[Hashable, list[Identifier]], unit_merged_ids_by_synonym: dict[str, Identifier], synopse_activity: Any, synopse_organization_ids_by_query_string: dict[str, MergedOrganizationIdentifier]) Generator[ExtractedActivity, None, None] ¶
Transform synopse projects into MEx activities.
- Parameters:
synopse_projects – Iterable of synopse projects
extracted_primary_source – Extracted report server primary sources
contact_merged_ids_by_emails – Mapping from LDAP emails to contact IDs
contributor_merged_ids_by_name – Mapping from person names to contributor IDs
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
synopse_activity – synopse activity default values
synopse_organization_ids_by_query_string – merged organization ids by org name
- Returns:
Generator for extracted activities
- mex.extractors.synopse.transform.transform_synopse_studies_into_access_platforms(synopse_studies: Iterable[SynopseStudy], unit_merged_ids_by_synonym: dict[str, MergedOrganizationalUnitIdentifier], extracted_primary_source: ExtractedPrimarySource, synopse_access_platform: Any) Generator[ExtractedAccessPlatform, None, None] ¶
Transform synopse studies into access platforms.
- Parameters:
synopse_studies – Iterable of Synopse Studies
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
extracted_primary_source – Extracted report server primary source
synopse_access_platform – access platform mapping model with default values
- Returns:
extracted access platform
- mex.extractors.synopse.transform.transform_synopse_variables_belonging_to_same_variable_group_to_mex_variables(variables: Iterable[SynopseVariable], belongs_to: ExtractedVariableGroup, resource_ids_by_synopse_id: dict[str, list[Identifier]], extracted_primary_source: ExtractedPrimarySource) Generator[ExtractedVariable, None, None] ¶
Transform Synopse Variables to MEx datums.
- Parameters:
variables – Iterable of Synopse Variables
belongs_to – extracted variable group that the variables belong to
resource_ids_by_synopse_id – Map from synopse ID to list of study resources stable target id
extracted_primary_source – Extracted report server primary source
- Returns:
Generator for ExtractedVariable
- mex.extractors.synopse.transform.transform_synopse_variables_to_mex_variable_groups(synopse_variables_by_thema: dict[str, list[SynopseVariable]], extracted_primary_source: ExtractedPrimarySource, resource_ids_by_synopse_id: dict[str, list[Identifier]]) Generator[ExtractedVariableGroup, None, None] ¶
Transform Synopse Variable Sets to MEx Variable Groups.
- Parameters:
synopse_variables_by_thema – mapping from “Thema und Fragebogenausschnitt” to the variables having this value
extracted_primary_source – Extracted report server primary source
resource_ids_by_synopse_id – Map from synopse ID to list of study resources stable target id
- Returns:
Generator for extracted variable groups
- mex.extractors.synopse.transform.transform_synopse_variables_to_mex_variables(synopse_variables_by_thema: dict[str, list[SynopseVariable]], variable_groups: Iterable[ExtractedVariableGroup], resource_ids_by_synopse_id: dict[str, list[Identifier]], extracted_primary_source: ExtractedPrimarySource) Generator[ExtractedVariable, None, None] ¶
Transform Synopse Variable Sets to MEx datums.
- Parameters:
synopse_variables_by_thema – mapping from “Thema und Fragebogenausschnitt” to the variables having this value
variable_groups – extracted variable groups
resource_ids_by_synopse_id – Map from synopse ID to list of study resources stable target id
extracted_primary_source – Extracted report server primary source
- Returns:
Generator for ExtractedVariable