mex.extractors package¶
Subpackages¶
- mex.extractors.artificial package
- mex.extractors.biospecimen package
- mex.extractors.blueant package- Subpackages
- Submodules
- mex.extractors.blueant.checks module
- mex.extractors.blueant.connector module- BlueAntConnector- BlueAntConnector._get_json_from_api()
- BlueAntConnector._set_authentication()
- BlueAntConnector._set_url()
- BlueAntConnector.get_client_name()
- BlueAntConnector.get_department_name()
- BlueAntConnector.get_persons()
- BlueAntConnector.get_projects()
- BlueAntConnector.get_status_name()
- BlueAntConnector.get_type_description()
 
 
- mex.extractors.blueant.extract module
- mex.extractors.blueant.filter module
- mex.extractors.blueant.main module
- mex.extractors.blueant.settings module
- mex.extractors.blueant.transform module
- Module contents
 
- mex.extractors.confluence_vvt package- Submodules
- mex.extractors.confluence_vvt.connector module
- mex.extractors.confluence_vvt.extract module
- mex.extractors.confluence_vvt.main module
- mex.extractors.confluence_vvt.models module
- mex.extractors.confluence_vvt.parse_html module
- mex.extractors.confluence_vvt.settings module
- mex.extractors.confluence_vvt.transform module
- Module contents
 
- mex.extractors.consent_mailer package- Submodules
- mex.extractors.consent_mailer.extract module
- mex.extractors.consent_mailer.filter module
- mex.extractors.consent_mailer.main module
- mex.extractors.consent_mailer.settings module- ConsentMailerSettings- ConsentMailerSettings.backend_fetch_chunk_size
- ConsentMailerSettings.mailpit_api_password
- ConsentMailerSettings.mailpit_api_url
- ConsentMailerSettings.mailpit_api_user
- ConsentMailerSettings.model_config
- ConsentMailerSettings.schedule
- ConsentMailerSettings.smtp_server
- ConsentMailerSettings.template_path
 
 
- mex.extractors.consent_mailer.transform module
- Module contents
 
- mex.extractors.contact_point package
- mex.extractors.datenkompass package
- mex.extractors.datscha_web package
- mex.extractors.endnote package- Submodules
- mex.extractors.endnote.checks module
- mex.extractors.endnote.extract module
- mex.extractors.endnote.main module
- mex.extractors.endnote.model module- EndnoteRecord- EndnoteRecord.abstract
- EndnoteRecord.authors
- EndnoteRecord.call_num
- EndnoteRecord.custom3
- EndnoteRecord.custom4
- EndnoteRecord.custom6
- EndnoteRecord.database
- EndnoteRecord.electronic_resource_num
- EndnoteRecord.isbn
- EndnoteRecord.keyword
- EndnoteRecord.language
- EndnoteRecord.model_config
- EndnoteRecord.number
- EndnoteRecord.pages
- EndnoteRecord.periodical
- EndnoteRecord.pub_dates
- EndnoteRecord.publisher
- EndnoteRecord.rec_number
- EndnoteRecord.ref_type
- EndnoteRecord.related_urls
- EndnoteRecord.secondary_authors
- EndnoteRecord.secondary_title
- EndnoteRecord.tertiary_authors
- EndnoteRecord.title
- EndnoteRecord.volume
- EndnoteRecord.year
 
 
- mex.extractors.endnote.settings module
- mex.extractors.endnote.transform module
- Module contents
 
- mex.extractors.ff_projects package
- mex.extractors.grippeweb package- Submodules
- mex.extractors.grippeweb.connector module
- mex.extractors.grippeweb.extract module
- mex.extractors.grippeweb.main module
- mex.extractors.grippeweb.settings module
- mex.extractors.grippeweb.transform module- get_or_create_external_partner()
- transform_grippeweb_access_platform_to_extracted_access_platform()
- transform_grippeweb_resource_mappings_to_dict()
- transform_grippeweb_resource_mappings_to_extracted_resources()
- transform_grippeweb_variable_group_to_extracted_variable_groups()
- transform_grippeweb_variable_to_extracted_variables()
 
- Module contents
 
- mex.extractors.ifsg package- Subpackages- mex.extractors.ifsg.models package- Submodules
- mex.extractors.ifsg.models.meta_catalogue2item module
- mex.extractors.ifsg.models.meta_catalogue2item2schema module
- mex.extractors.ifsg.models.meta_datatype module
- mex.extractors.ifsg.models.meta_disease module
- mex.extractors.ifsg.models.meta_field module
- mex.extractors.ifsg.models.meta_item module
- mex.extractors.ifsg.models.meta_schema2field module
- mex.extractors.ifsg.models.meta_schema2type module
- mex.extractors.ifsg.models.meta_type module
- Module contents
 
 
- mex.extractors.ifsg.models package
- Submodules
- mex.extractors.ifsg.connector module
- mex.extractors.ifsg.extract module
- mex.extractors.ifsg.filter module
- mex.extractors.ifsg.main module
- mex.extractors.ifsg.settings module
- mex.extractors.ifsg.transform module
- Module contents
 
- Subpackages
- mex.extractors.igs package
- mex.extractors.international_projects package
- mex.extractors.odk package
- mex.extractors.open_data package
- mex.extractors.pipeline package
- mex.extractors.primary_source package
- mex.extractors.publisher package
- mex.extractors.seq_repo package- Submodules
- mex.extractors.seq_repo.extract module
- mex.extractors.seq_repo.filter module
- mex.extractors.seq_repo.main module
- mex.extractors.seq_repo.model module- SeqRepoSource- SeqRepoSource.customer_org_unit_id
- SeqRepoSource.customer_sample_name
- SeqRepoSource.get_end_year()
- SeqRepoSource.get_identifier_in_primary_source()
- SeqRepoSource.get_partners()
- SeqRepoSource.get_start_year()
- SeqRepoSource.get_units()
- SeqRepoSource.lims_sample_id
- SeqRepoSource.model_config
- SeqRepoSource.project_coordinators
- SeqRepoSource.project_id
- SeqRepoSource.project_name
- SeqRepoSource.sequencing_date
- SeqRepoSource.sequencing_platform
- SeqRepoSource.species
 
 
- mex.extractors.seq_repo.settings module
- mex.extractors.seq_repo.transform module
- Module contents
 
- mex.extractors.sinks package
- mex.extractors.sumo package- Subpackages- mex.extractors.sumo.models package- Submodules
- mex.extractors.sumo.models.base module
- mex.extractors.sumo.models.cc1_data_model_nokeda module
- mex.extractors.sumo.models.cc1_data_valuesets module
- mex.extractors.sumo.models.cc2_aux_mapping module
- mex.extractors.sumo.models.cc2_aux_model module
- mex.extractors.sumo.models.cc2_aux_valuesets module
- mex.extractors.sumo.models.cc2_feat_projection module
- Module contents
 
 
- mex.extractors.sumo.models package
- Submodules
- mex.extractors.sumo.extract module
- mex.extractors.sumo.filter module
- mex.extractors.sumo.main module
- mex.extractors.sumo.settings module
- mex.extractors.sumo.transform module- create_new_organization_with_official_name()
- get_contact_merged_ids_by_emails()
- get_contact_merged_ids_by_names()
- transform_feat_projection_variable_to_mex_variable()
- transform_feat_variable_to_mex_variable_group()
- transform_model_nokeda_variable_to_mex_variable_group()
- transform_nokeda_aux_variable_to_mex_variable()
- transform_nokeda_aux_variable_to_mex_variable_group()
- transform_nokeda_model_variable_to_mex_variable()
- transform_resource_feat_model_to_mex_resource()
- transform_resource_nokeda_to_mex_resource()
- transform_sumo_access_platform_to_mex_access_platform()
- transform_sumo_activity_to_extracted_activity()
 
- Module contents
 
- Subpackages
- mex.extractors.synopse package- Subpackages
- Submodules
- mex.extractors.synopse.connector module
- mex.extractors.synopse.extract module
- mex.extractors.synopse.filter module
- mex.extractors.synopse.main module
- mex.extractors.synopse.settings module- SynopseSettings- SynopseSettings.datensatzuebersicht_path
- SynopseSettings.mapping_path
- SynopseSettings.metadaten_zu_datensaetzen_path
- SynopseSettings.model_config
- SynopseSettings.projekt_und_studienverwaltung_path
- SynopseSettings.report_server_password
- SynopseSettings.report_server_url
- SynopseSettings.report_server_username
- SynopseSettings.variablenuebersicht_path
 
 
- mex.extractors.synopse.transform module- transform_overviews_to_resource_lookup()
- transform_synopse_data_to_mex_resources()
- transform_synopse_project_to_activity()
- transform_synopse_projects_to_mex_activities()
- transform_synopse_studies_into_access_platforms()
- transform_synopse_variables_belonging_to_same_variable_group_to_mex_variables()
- transform_synopse_variables_to_mex_variable_groups()
- transform_synopse_variables_to_mex_variables()
 
- Module contents
 
- mex.extractors.system package
- mex.extractors.voxco package
- mex.extractors.wikidata package
Submodules¶
mex.extractors.drop module¶
- class mex.extractors.drop.DropApiConnector¶
- Bases: - HTTPConnector- Connector class to handle interaction with the Drop API. - API_VERSION = 'v0'¶
 - _check_availability() None¶
- Send a GET request to verify the API is available. 
 - _set_authentication() None¶
- Set the drop API key to all session headers. 
 - _set_url() None¶
- Set the drop api url with the version path. 
 - get_file(x_system: str, file_id: str) dict[str, Any]¶
- Get the content of a JSON file from the x_system. - Parameters:
- x_system – name of the x_system 
- file_id – name of the file 
 
- Returns:
- content of the JSON file 
 
 - get_raw_file(x_system: str, file_id: str) Response¶
- Get the raw content of a file from the x_system. - Parameters:
- x_system – name of the x_system 
- file_id – name of the file 
 
- Returns:
- raw content of the file 
 
 - list_files(x_system: str) list[str]¶
- Get available files for the x_system. - Parameters:
- x_system – name of the x_system to list the files for 
- Returns:
- list of available filenames for the x_system 
 
 
mex.extractors.filters module¶
- mex.extractors.filters.filter_by_global_rules(primary_source_id: MergedPrimarySourceIdentifier, items: Iterable[RawDataT]) list[RawDataT]¶
- Filter out items according to global filter rules, return filtered items. - Parameters:
- primary_source_id – identifier of the primary source 
- items – items, source or resource to be filtered 
 
 
mex.extractors.logging module¶
- mex.extractors.logging.log_filter(identifier_in_primary_source: str | None, primary_source_id: MergedPrimarySourceIdentifier, reason: str) None¶
- Log filtered sources. - Parameters:
- identifier_in_primary_source – optional identifier in the primary source 
- primary_source_id – identifier of the primary source 
- reason – string explaining the reason for filtering 
 
 
mex.extractors.main module¶
mex.extractors.models module¶
- class mex.extractors.models.BaseRawData¶
- Bases: - BaseModel- Raw-data base providing standardized access to attributes for filtering. - abstractmethod get_end_year() TemporalEntity | None¶
- Return end year from extractor. 
 - abstractmethod get_identifier_in_primary_source() str | None¶
- Return identifier in primary source from extractor. 
 - abstractmethod get_partners() Sequence[str | None]¶
- Return partners from extractor. 
 - abstractmethod get_start_year() TemporalEntity | None¶
- Return start year from extractor. 
 - abstractmethod get_units() Sequence[str | None]¶
- Return units from extractor. 
 - model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 
mex.extractors.settings module¶
- class mex.extractors.settings.Settings(_env_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | Path | None = None, *, pdb: bool = False, MEX_SINK: list[Sink] = [Sink.NDJSON], MEX_ASSETS_DIR: Path = PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), MEX_WORK_DIR: Path = PosixPath('/home/runner/work/mex-extractors/mex-extractors'), MEX_IDENTITY_PROVIDER: IdentityProvider = IdentityProvider.MEMORY, MEX_BACKEND_API_URL: HttpUrl = HttpUrl('http://localhost:8080/'), MEX_BACKEND_API_KEY: SecretStr = SecretStr('**********'), MEX_BACKEND_API_PARALLELIZATION: int = 1, MEX_BACKEND_API_CHUNK_SIZE: int = 25, MEX_VERIFY_SESSION: bool | AssetsPath = True, MEX_ORGANIGRAM_PATH: AssetsPath = AssetsPath('raw-data/organigram/organizational_units.json'), MEX_PRIMARY_SOURCES_PATH: AssetsPath = AssetsPath('raw-data/primary-sources/primary-sources.json'), MEX_LDAP_URL: SecretStr = SecretStr('**********'), MEX_LDAP_SEARCH_BASE: str = 'DC=rki,DC=local', MEX_WIKI_API_URL: HttpUrl = HttpUrl('http://wikidata/'), MEX_WEB_USER_AGENT: str = 'rki/mex', MEX_ORCID_API_URL: HttpUrl = HttpUrl('https://orcid/'), all_filter_mapping_path: AssetsPath = AssetsPath('mappings/__all__'), all_checks_path: AssetsPath = AssetsPath('checks/__final__'), MEX_SKIP_EXTRACTORS: list[str] = [], MEX_DROP_API_KEY: SecretStr = SecretStr('**********'), MEX_DROP_API_URL: HttpUrl = HttpUrl('http://localhost:8081/'), MEX_SCHEDULE: str = '0 0 * * *', kerberos_user: str = 'user@domain.tld', kerberos_password: SecretStr = SecretStr('**********'), s3_endpoint_url: HttpUrl = HttpUrl('https://s3/'), s3_access_key_id: SecretStr = SecretStr('**********'), s3_secret_access_key: SecretStr = SecretStr('**********'), s3_bucket_key: str = 's3_bucket', biospecimen: BiospecimenSettings = BiospecimenSettings(raw_data_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/biospecimen'), key_col='Feldname', val_col='zu extrahierender Wert (maschinenlesbar)', mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/biospecimen')), blueant: BlueAntSettings = BlueAntSettings(api_key=SecretStr('**********'), url='https://blueant', skip_labels=['test'], delete_prefixes=['_', '1_', '2_', '3_', '4_', '5_', '6_', '7_', '8_', '9_'], mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/blueant')), confluence_vvt: ConfluenceVvtSettings = ConfluenceVvtSettings(url='https://confluence.vvt', username=SecretStr('**********'), password=SecretStr('**********'), overview_page_id='123456', template_v1_mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/confluence-vvt_template_v1'), skip_pages=['123456']), consent_mailer: ConsentMailerSettings = ConsentMailerSettings(backend_fetch_chunk_size=1, mailpit_api_url='localhost:8025', mailpit_api_user=SecretStr('**********'), mailpit_api_password=SecretStr('**********'), schedule=None, smtp_server='localhost:1025', template_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mailings')), contact_point: ContactPointSettings = ContactPointSettings(mex_email='mex@rki.de'), datenkompass: DatenkompassSettings = DatenkompassSettings(unit_filter='e.g. unit', organization_filter='Organization', cutoff_number_authors=3, list_delimiter='; ', mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/mapping-to-external-schema/datenkompass')), datscha_web: DatschaWebSettings = DatschaWebSettings(url='https://datscha/', vorname=SecretStr('**********'), nachname=SecretStr('**********'), pw=SecretStr('**********'), organisation='RKI'), endnote: EndnoteSettings = EndnoteSettings(mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/endnote'), cutoff_number_authors=42), ff_projects: FFProjectsSettings = FFProjectsSettings(file_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/ff-projects/ff-projects.xlsx'), skip_funding=['Sonstige'], skip_topics=['Sonstige'], skip_years_strings=['fehlt', 'keine', 'offen'], skip_clients=['Sonstige'], mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/ff-projects')), grippeweb: GrippewebSettings = GrippewebSettings(mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/grippeweb'), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database'), ifsg: IFSGSettings = IFSGSettings(mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/ifsg'), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database'), igs: IGSSettings = IGSSettings(url='https://igs', mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/igs')), international_projects: InternationalProjectsSettings = InternationalProjectsSettings(file_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/international-projects/international_projects.xlsx'), mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/international-projects')), odk: ODKSettings = ODKSettings(raw_data_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/odk'), mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/odk')), open_data: OpenDataSettings = OpenDataSettings(url='https://zenodo', community_rki='robertkochinstitut', mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/open-data')), publisher: PublisherSettings = PublisherSettings(skip_entity_types=['MergedPrimarySource', 'MergedConsent'], allowed_person_primary_sources=['endnote']), seq_repo: SeqRepoSettings = SeqRepoSettings(mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/seq-repo')), sumo: SumoSettings = SumoSettings(raw_data_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/sumo'), mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/sumo')), synopse: SynopseSettings = SynopseSettings(report_server_url='https://report-server/', report_server_username=SecretStr('**********'), report_server_password=SecretStr('**********'), variablenuebersicht_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/variablenuebersicht.csv'), projekt_und_studienverwaltung_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/projekt_und_studienverwaltung.csv'), metadaten_zu_datensaetzen_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/metadaten_zu_datensaetzen.csv'), datensatzuebersicht_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/datensatzuebersicht.csv'), mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/synopse')), system: SystemSettings = SystemSettings(max_run_age_in_days=30), voxco: VoxcoSettings = VoxcoSettings(mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/voxco')), wikidata: WikidataSettings = WikidataSettings(mapping_path=AssetsPath('/home/runner/work/mex-extractors/mex-extractors/assets/mappings/wikidata')))¶
- Bases: - BaseSettings- Settings definition class for extractors and related scripts. - all_checks_path: AssetsPath¶
 - all_filter_mapping_path: AssetsPath¶
 - biospecimen: BiospecimenSettings¶
 - blueant: BlueAntSettings¶
 - confluence_vvt: ConfluenceVvtSettings¶
 - consent_mailer: ConsentMailerSettings¶
 - contact_point: ContactPointSettings¶
 - datenkompass: DatenkompassSettings¶
 - datscha_web: DatschaWebSettings¶
 - drop_api_key: SecretStr¶
 - drop_api_url: HttpUrl¶
 - endnote: EndnoteSettings¶
 - ff_projects: FFProjectsSettings¶
 - grippeweb: GrippewebSettings¶
 - ifsg: IFSGSettings¶
 - igs: IGSSettings¶
 - international_projects: InternationalProjectsSettings¶
 - kerberos_password: SecretStr¶
 - kerberos_user: str¶
 - model_config: ClassVar[SettingsConfigDict] = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': False, 'env_nested_delimiter': '__', 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'mex_', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'populate_by_name': True, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}¶
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - odk: ODKSettings¶
 - open_data: OpenDataSettings¶
 - publisher: PublisherSettings¶
 - s3_access_key_id: SecretStr¶
 - s3_bucket_key: str¶
 - s3_endpoint_url: HttpUrl¶
 - s3_secret_access_key: SecretStr¶
 - schedule: str¶
 - seq_repo: SeqRepoSettings¶
 - skip_extractors: list[str]¶
 - sumo: SumoSettings¶
 - synopse: SynopseSettings¶
 - system: SystemSettings¶
 - voxco: VoxcoSettings¶
 - wikidata: WikidataSettings¶
 
mex.extractors.sorters module¶
- mex.extractors.sorters.topological_sort(items: list[ItemT], primary_key: str, *, parent_key: str | None = None, child_key: str | None = None) None¶
- Sort the given list of items in-place according to their topology. - Items can refer to each other using key fields. A parent item can reference a child item by storing the child’s primary_key in the parent’s child_key field. Similarly, a child can reference its parent using the parent_key field. - This can be useful for submitting items to the backend in the correct order. 
mex.extractors.utils module¶
- mex.extractors.utils.ensure_list(values: list[T] | T | None) list[T]¶
- Wrap single objects in lists, replace None with [] and return lists untouched. 
- mex.extractors.utils.load_yaml(path: PathLike[str]) dict[str, Any]¶
- Load the contents of a YAML file from the given path and return as a dict.