mex.extractors package¶
Subpackages¶
- mex.extractors.artificial package
- mex.extractors.biospecimen package
- mex.extractors.blueant package
- Subpackages
- Submodules
- mex.extractors.blueant.connector module
BlueAntConnector
BlueAntConnector._get_json_from_api()
BlueAntConnector._set_authentication()
BlueAntConnector._set_url()
BlueAntConnector.get_client_name()
BlueAntConnector.get_department_name()
BlueAntConnector.get_persons()
BlueAntConnector.get_projects()
BlueAntConnector.get_status_name()
BlueAntConnector.get_type_description()
- mex.extractors.blueant.extract module
- mex.extractors.blueant.filter module
- mex.extractors.blueant.main module
- mex.extractors.blueant.settings module
- mex.extractors.blueant.transform module
- Module contents
- mex.extractors.confluence_vvt package
- Submodules
- mex.extractors.confluence_vvt.connector module
- mex.extractors.confluence_vvt.extract module
- mex.extractors.confluence_vvt.main module
- mex.extractors.confluence_vvt.models module
ConfluenceVvtCell
ConfluenceVvtHeading
ConfluenceVvtPage
ConfluenceVvtPage.get_end_year()
ConfluenceVvtPage.get_identifier_in_primary_source()
ConfluenceVvtPage.get_partners()
ConfluenceVvtPage.get_start_year()
ConfluenceVvtPage.get_units()
ConfluenceVvtPage.id
ConfluenceVvtPage.model_computed_fields
ConfluenceVvtPage.model_config
ConfluenceVvtPage.model_fields
ConfluenceVvtPage.tables
ConfluenceVvtPage.title
ConfluenceVvtRow
ConfluenceVvtTable
ConfluenceVvtValue
- mex.extractors.confluence_vvt.parse_html module
- mex.extractors.confluence_vvt.settings module
ConfluenceVvtSettings
ConfluenceVvtSettings.model_computed_fields
ConfluenceVvtSettings.model_config
ConfluenceVvtSettings.model_fields
ConfluenceVvtSettings.overview_page_id
ConfluenceVvtSettings.password
ConfluenceVvtSettings.skip_pages
ConfluenceVvtSettings.template_v1_mapping_path
ConfluenceVvtSettings.url
ConfluenceVvtSettings.username
- mex.extractors.confluence_vvt.transform module
- Module contents
- mex.extractors.consent_mailer package
- Submodules
- mex.extractors.consent_mailer.extract module
- mex.extractors.consent_mailer.filter module
- mex.extractors.consent_mailer.main module
- mex.extractors.consent_mailer.settings module
ConsentMailerSettings
ConsentMailerSettings.mailpit_api_password
ConsentMailerSettings.mailpit_api_url
ConsentMailerSettings.mailpit_api_user
ConsentMailerSettings.model_computed_fields
ConsentMailerSettings.model_config
ConsentMailerSettings.model_fields
ConsentMailerSettings.schedule
ConsentMailerSettings.smtp_server
ConsentMailerSettings.template_path
- mex.extractors.consent_mailer.transform module
- Module contents
- mex.extractors.contact_point package
- mex.extractors.datenkompass package
- mex.extractors.datscha_web package
- mex.extractors.endnote package
- Submodules
- mex.extractors.endnote.extract module
- mex.extractors.endnote.main module
- mex.extractors.endnote.model module
EndnoteRecord
EndnoteRecord.abstract
EndnoteRecord.authors
EndnoteRecord.call_num
EndnoteRecord.custom3
EndnoteRecord.custom4
EndnoteRecord.custom6
EndnoteRecord.database
EndnoteRecord.electronic_resource_num
EndnoteRecord.isbn
EndnoteRecord.keyword
EndnoteRecord.language
EndnoteRecord.model_computed_fields
EndnoteRecord.model_config
EndnoteRecord.model_fields
EndnoteRecord.number
EndnoteRecord.pages
EndnoteRecord.periodical
EndnoteRecord.pub_dates
EndnoteRecord.publisher
EndnoteRecord.rec_number
EndnoteRecord.ref_type
EndnoteRecord.related_urls
EndnoteRecord.secondary_authors
EndnoteRecord.secondary_title
EndnoteRecord.tertiary_authors
EndnoteRecord.title
EndnoteRecord.volume
EndnoteRecord.year
- mex.extractors.endnote.settings module
- mex.extractors.endnote.transform module
- Module contents
- mex.extractors.ff_projects package
- mex.extractors.grippeweb package
- Submodules
- mex.extractors.grippeweb.connector module
- mex.extractors.grippeweb.extract module
- mex.extractors.grippeweb.main module
- mex.extractors.grippeweb.settings module
- mex.extractors.grippeweb.transform module
get_or_create_external_partner()
transform_grippeweb_access_platform_to_extracted_access_platform()
transform_grippeweb_resource_mappings_to_dict()
transform_grippeweb_resource_mappings_to_extracted_resources()
transform_grippeweb_variable_group_to_extracted_variable_groups()
transform_grippeweb_variable_to_extracted_variables()
- Module contents
- mex.extractors.ifsg package
- Subpackages
- mex.extractors.ifsg.models package
- Submodules
- mex.extractors.ifsg.models.meta_catalogue2item module
- mex.extractors.ifsg.models.meta_catalogue2item2schema module
- mex.extractors.ifsg.models.meta_datatype module
- mex.extractors.ifsg.models.meta_disease module
- mex.extractors.ifsg.models.meta_field module
- mex.extractors.ifsg.models.meta_item module
- mex.extractors.ifsg.models.meta_schema2field module
- mex.extractors.ifsg.models.meta_schema2type module
- mex.extractors.ifsg.models.meta_type module
- Module contents
- mex.extractors.ifsg.models package
- Submodules
- mex.extractors.ifsg.connector module
- mex.extractors.ifsg.extract module
- mex.extractors.ifsg.filter module
- mex.extractors.ifsg.main module
- mex.extractors.ifsg.settings module
- mex.extractors.ifsg.transform module
- Module contents
- Subpackages
- mex.extractors.igs package
- mex.extractors.international_projects package
- mex.extractors.odk package
- mex.extractors.open_data package
- mex.extractors.pipeline package
- mex.extractors.primary_source package
- mex.extractors.publisher package
- mex.extractors.seq_repo package
- Submodules
- mex.extractors.seq_repo.extract module
- mex.extractors.seq_repo.filter module
- mex.extractors.seq_repo.main module
- mex.extractors.seq_repo.model module
SeqRepoSource
SeqRepoSource.customer_org_unit_id
SeqRepoSource.customer_sample_name
SeqRepoSource.get_end_year()
SeqRepoSource.get_identifier_in_primary_source()
SeqRepoSource.get_partners()
SeqRepoSource.get_start_year()
SeqRepoSource.get_units()
SeqRepoSource.lims_sample_id
SeqRepoSource.model_computed_fields
SeqRepoSource.model_config
SeqRepoSource.model_fields
SeqRepoSource.project_coordinators
SeqRepoSource.project_id
SeqRepoSource.project_name
SeqRepoSource.sequencing_date
SeqRepoSource.sequencing_platform
SeqRepoSource.species
- mex.extractors.seq_repo.settings module
- mex.extractors.seq_repo.transform module
- Module contents
- mex.extractors.sinks package
- mex.extractors.sumo package
- Subpackages
- mex.extractors.sumo.models package
- Submodules
- mex.extractors.sumo.models.base module
- mex.extractors.sumo.models.cc1_data_model_nokeda module
- mex.extractors.sumo.models.cc1_data_valuesets module
- mex.extractors.sumo.models.cc2_aux_mapping module
- mex.extractors.sumo.models.cc2_aux_model module
- mex.extractors.sumo.models.cc2_aux_valuesets module
- mex.extractors.sumo.models.cc2_feat_projection module
- Module contents
- mex.extractors.sumo.models package
- Submodules
- mex.extractors.sumo.extract module
- mex.extractors.sumo.filter module
- mex.extractors.sumo.main module
- mex.extractors.sumo.settings module
- mex.extractors.sumo.transform module
create_new_organization_with_official_name()
get_contact_merged_ids_by_emails()
get_contact_merged_ids_by_names()
transform_feat_projection_variable_to_mex_variable()
transform_feat_variable_to_mex_variable_group()
transform_model_nokeda_variable_to_mex_variable_group()
transform_nokeda_aux_variable_to_mex_variable()
transform_nokeda_aux_variable_to_mex_variable_group()
transform_nokeda_model_variable_to_mex_variable()
transform_resource_feat_model_to_mex_resource()
transform_resource_nokeda_to_mex_resource()
transform_sumo_access_platform_to_mex_access_platform()
transform_sumo_activity_to_extracted_activity()
- Module contents
- Subpackages
- mex.extractors.synopse package
- Subpackages
- Submodules
- mex.extractors.synopse.connector module
- mex.extractors.synopse.extract module
- mex.extractors.synopse.filter module
- mex.extractors.synopse.main module
- mex.extractors.synopse.settings module
SynopseSettings
SynopseSettings.datensatzuebersicht_path
SynopseSettings.mapping_path
SynopseSettings.metadaten_zu_datensaetzen_path
SynopseSettings.model_computed_fields
SynopseSettings.model_config
SynopseSettings.model_fields
SynopseSettings.projekt_und_studienverwaltung_path
SynopseSettings.report_server_password
SynopseSettings.report_server_url
SynopseSettings.report_server_username
SynopseSettings.variablenuebersicht_path
- mex.extractors.synopse.transform module
transform_overviews_to_resource_lookup()
transform_synopse_data_to_mex_resources()
transform_synopse_project_to_activity()
transform_synopse_projects_to_mex_activities()
transform_synopse_studies_into_access_platforms()
transform_synopse_variables_belonging_to_same_variable_group_to_mex_variables()
transform_synopse_variables_to_mex_variable_groups()
transform_synopse_variables_to_mex_variables()
- Module contents
- mex.extractors.voxco package
- mex.extractors.wikidata package
Submodules¶
mex.extractors.drop module¶
- class mex.extractors.drop.DropApiConnector¶
Bases:
HTTPConnector
Connector class to handle interaction with the Drop API.
- API_VERSION = 'v0'¶
- _check_availability() None ¶
Send a GET request to verify the API is available.
- _set_authentication() None ¶
Set the drop API key to all session headers.
- _set_url() None ¶
Set the drop api url with the version path.
- get_file(x_system: str, file_id: str) dict[str, Any] ¶
Get the content of a JSON file from the x_system.
- Parameters:
x_system – name of the x_system
file_id – name of the file
- Returns:
content of the JSON file
- get_raw_file(x_system: str, file_id: str) Response ¶
Get the raw content of a file from the x_system.
- Parameters:
x_system – name of the x_system
file_id – name of the file
- Returns:
raw content of the file
- list_files(x_system: str) list[str] ¶
Get available files for the x_system.
- Parameters:
x_system – name of the x_system to list the files for
- Returns:
list of available filenames for the x_system
mex.extractors.filters module¶
- mex.extractors.filters.filter_by_global_rules(primary_source_id: MergedPrimarySourceIdentifier, items: Iterable[RawDataT]) list[RawDataT] ¶
Filter out items according to global filter rules, return filtered items.
- Parameters:
primary_source_id – identifier of the primary source
items – items, source or resource to be filtered
mex.extractors.logging module¶
- mex.extractors.logging.log_filter(identifier_in_primary_source: str | None, primary_source_id: MergedPrimarySourceIdentifier, reason: str) None ¶
Log filtered sources.
- Parameters:
identifier_in_primary_source – optional identifier in the primary source
primary_source_id – identifier of the primary source
reason – string explaining the reason for filtering
mex.extractors.main module¶
mex.extractors.models module¶
- class mex.extractors.models.BaseRawData¶
Bases:
BaseModel
Raw-data base providing standardized access to attributes for filtering.
- abstractmethod get_end_year() TemporalEntity | None ¶
Return end year from extractor.
- abstractmethod get_identifier_in_primary_source() str | None ¶
Return identifier in primary source from extractor.
- abstractmethod get_partners() Sequence[str | None] ¶
Return partners from extractor.
- abstractmethod get_start_year() TemporalEntity | None ¶
Return start year from extractor.
- abstractmethod get_units() Sequence[str | None] ¶
Return units from extractor.
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
mex.extractors.settings module¶
- class mex.extractors.settings.Settings(_env_file: ~pathlib.Path | str | ~collections.abc.Sequence[~pathlib.Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | ~pathlib.Path | None = None, *, pdb: bool = False, MEX_SINK: list[~mex.common.types.sink.Sink] = [Sink.NDJSON], MEX_ASSETS_DIR: ~pathlib.Path = PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), MEX_WORK_DIR: ~pathlib.Path = PosixPath('/home/runner/work/mex-extractors/mex-extractors'), MEX_IDENTITY_PROVIDER: ~mex.common.types.identity.IdentityProvider = IdentityProvider.MEMORY, MEX_BACKEND_API_URL: ~pydantic_core._pydantic_core.Url = Url('http://localhost:8080/'), MEX_BACKEND_API_KEY: ~pydantic.types.SecretStr = SecretStr('**********'), MEX_BACKEND_API_PARALLELIZATION: int = 1, MEX_BACKEND_API_CHUNK_SIZE: int = 25, MEX_VERIFY_SESSION: bool | ~mex.common.types.path.AssetsPath = True, MEX_ORGANIGRAM_PATH: ~mex.common.types.path.AssetsPath = AssetsPath("raw-data/organigram/organizational_units.json"), MEX_PRIMARY_SOURCES_PATH: ~mex.common.types.path.AssetsPath = AssetsPath("raw-data/primary-sources/primary-sources.json"), MEX_LDAP_URL: ~pydantic.types.SecretStr = SecretStr('**********'), MEX_LDAP_SEARCH_BASE: str = 'DC=rki,DC=local', MEX_WIKI_API_URL: ~pydantic_core._pydantic_core.Url = Url('http://wikidata/'), MEX_WEB_USER_AGENT: str = 'rki/mex', MEX_ORCID_API_URL: ~pydantic_core._pydantic_core.Url = Url('https://orcid/'), all_filter_mapping_path: ~mex.common.types.path.AssetsPath = AssetsPath("mappings/__all__"), MEX_SKIP_EXTRACTORS: list[str] = [], MEX_DROP_API_KEY: ~pydantic.types.SecretStr = SecretStr('**********'), MEX_DROP_API_URL: ~pydantic_core._pydantic_core.Url = Url('http://localhost:8081/'), MEX_SCHEDULE: str = '0 0 * * *', kerberos_user: str = 'user@domain.tld', kerberos_password: ~pydantic.types.SecretStr = SecretStr('**********'), s3_endpoint_url: ~pydantic_core._pydantic_core.Url = Url('https://s3/'), s3_access_key_id: ~pydantic.types.SecretStr = SecretStr('**********'), s3_secret_access_key: ~pydantic.types.SecretStr = SecretStr('**********'), s3_bucket_key: str = 's3_bucket', biospecimen: ~mex.extractors.biospecimen.settings.BiospecimenSettings = BiospecimenSettings(raw_data_path=AssetsPath("raw-data/biospecimen"), key_col='Feldname', val_col='zu extrahierender Wert (maschinenlesbar)', mapping_path=AssetsPath("mappings/biospecimen")), blueant: ~mex.extractors.blueant.settings.BlueAntSettings = BlueAntSettings(api_key=SecretStr('**********'), url='https://blueant', skip_labels=['test'], delete_prefixes=['_', '1_', '2_', '3_', '4_', '5_', '6_', '7_', '8_', '9_'], mapping_path=AssetsPath("mappings/blueant")), confluence_vvt: ~mex.extractors.confluence_vvt.settings.ConfluenceVvtSettings = ConfluenceVvtSettings(url='https://confluence.vvt', username=SecretStr('**********'), password=SecretStr('**********'), overview_page_id='123456', template_v1_mapping_path=AssetsPath("mappings/confluence-vvt_template_v1"), skip_pages=['123456']), consent_mailer: ~mex.extractors.consent_mailer.settings.ConsentMailerSettings = ConsentMailerSettings(debug=False, sink=[<Sink.NDJSON: 'ndjson'>], assets_dir=PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), work_dir=PosixPath('/home/runner/work/mex-extractors/mex-extractors'), identity_provider=<IdentityProvider.MEMORY: 'memory'>, backend_api_url=Url('http://localhost:8080/'), backend_api_key=SecretStr('**********'), backend_api_parallelization=1, backend_api_chunk_size=25, verify_session=True, organigram_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/organigram/organizational_units.json"), primary_sources_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/primary-sources/primary-sources.json"), ldap_url=SecretStr('**********'), ldap_search_base='DC=rki,DC=local', wiki_api_url=Url('http://wikidata/'), mex_web_user_agent='rki/mex', orcid_api_url=Url('https://orcid/'), mailpit_api_url='localhost:8025', mailpit_api_user=SecretStr('**********'), mailpit_api_password=SecretStr('**********'), schedule=None, smtp_server='localhost:1025', template_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mailings")), contact_point: ~mex.extractors.contact_point.settings.ContactPointSettings = ContactPointSettings(mex_email=Email("mex@rki.de")), datenkompass: ~mex.extractors.datenkompass.settings.DatenkompassSettings = DatenkompassSettings(unit_filter='e.g. unit', organization_filter='Organization', cutoff_number_authors=3, list_delimiter='; '), datscha_web: ~mex.extractors.datscha_web.settings.DatschaWebSettings = DatschaWebSettings(url='https://datscha/', vorname=SecretStr('**********'), nachname=SecretStr('**********'), pw=SecretStr('**********'), organisation='RKI'), endnote: ~mex.extractors.endnote.settings.EndnoteSettings = EndnoteSettings(mapping_path=AssetsPath("mappings/endnote"), cutoff_number_authors=42), ff_projects: ~mex.extractors.ff_projects.settings.FFProjectsSettings = FFProjectsSettings(file_path=AssetsPath("raw-data/ff-projects/ff-projects.xlsx"), skip_funding=['Sonstige'], skip_topics=['Sonstige'], skip_years_strings=['fehlt', 'keine', 'offen'], skip_clients=['Sonstige'], mapping_path=AssetsPath("mappings/ff-projects")), grippeweb: ~mex.extractors.grippeweb.settings.GrippewebSettings = GrippewebSettings(mapping_path=AssetsPath("mappings/grippeweb"), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database'), ifsg: ~mex.extractors.ifsg.settings.IFSGSettings = IFSGSettings(mapping_path=AssetsPath("mappings/ifsg"), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database'), igs: ~mex.extractors.igs.settings.IGSSettings = IGSSettings(url='https://igs', mapping_path=AssetsPath("mappings/igs")), international_projects: ~mex.extractors.international_projects.settings.InternationalProjectsSettings = InternationalProjectsSettings(file_path=AssetsPath("raw-data/international-projects/international_projects.xlsx"), mapping_path=AssetsPath("mappings/international-projects")), odk: ~mex.extractors.odk.settings.ODKSettings = ODKSettings(raw_data_path=AssetsPath("raw-data/odk"), mapping_path=AssetsPath("mappings/odk")), open_data: ~mex.extractors.open_data.settings.OpenDataSettings = OpenDataSettings(url='https://zenodo', community_rki='robertkochinstitut', mapping_path=AssetsPath("mappings/open-data")), publisher: ~mex.extractors.publisher.settings.PublisherSettings = PublisherSettings(skip_entity_types=['MergedPrimarySource', 'MergedConsent'], allowed_person_primary_sources=['endnote']), seq_repo: ~mex.extractors.seq_repo.settings.SeqRepoSettings = SeqRepoSettings(mapping_path=AssetsPath("mappings/seq-repo")), sumo: ~mex.extractors.sumo.settings.SumoSettings = SumoSettings(raw_data_path=AssetsPath("raw-data/sumo"), mapping_path=AssetsPath("mappings/sumo")), synopse: ~mex.extractors.synopse.settings.SynopseSettings = SynopseSettings(report_server_url='https://report-server/', report_server_username=SecretStr('**********'), report_server_password=SecretStr('**********'), variablenuebersicht_path=AssetsPath("raw-data/synopse/variablenuebersicht.csv"), projekt_und_studienverwaltung_path=AssetsPath("raw-data/synopse/projekt_und_studienverwaltung.csv"), metadaten_zu_datensaetzen_path=AssetsPath("raw-data/synopse/metadaten_zu_datensaetzen.csv"), datensatzuebersicht_path=AssetsPath("raw-data/synopse/datensatzuebersicht.csv"), mapping_path=AssetsPath("mappings/synopse")), voxco: ~mex.extractors.voxco.settings.VoxcoSettings = VoxcoSettings(mapping_path=AssetsPath("mappings/voxco")), wikidata: ~mex.extractors.wikidata.settings.WikidataSettings = WikidataSettings(mapping_path=AssetsPath("mappings/wikidata")))¶
Bases:
BaseSettings
Settings definition class for extractors and related scripts.
- all_filter_mapping_path: AssetsPath¶
- biospecimen: BiospecimenSettings¶
- blueant: BlueAntSettings¶
- confluence_vvt: ConfluenceVvtSettings¶
- consent_mailer: ConsentMailerSettings¶
- contact_point: ContactPointSettings¶
- datenkompass: DatenkompassSettings¶
- datscha_web: DatschaWebSettings¶
- drop_api_key: SecretStr¶
- drop_api_url: Url¶
- endnote: EndnoteSettings¶
- ff_projects: FFProjectsSettings¶
- grippeweb: GrippewebSettings¶
- ifsg: IFSGSettings¶
- igs: IGSSettings¶
- international_projects: InternationalProjectsSettings¶
- kerberos_password: SecretStr¶
- kerberos_user: str¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[SettingsConfigDict] = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': False, 'env_nested_delimiter': '__', 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'mex_', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'populate_by_name': True, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_assignment': True, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'all_filter_mapping_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("mappings/__all__"), description='Path to the directory with the biospecimen mapping files containing the default values, absolute path or relative to `assets_dir`.'), 'assets_dir': FieldInfo(annotation=Path, required=False, default=PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), alias_priority=2, validation_alias='MEX_ASSETS_DIR', description='Path to directory that contains input files treated as read-only, looks for a folder named `assets` in the current directory by default.'), 'backend_api_chunk_size': FieldInfo(annotation=int, required=False, default=25, alias_priority=2, validation_alias='MEX_BACKEND_API_CHUNK_SIZE', description='How many items to load into the backend in one chunk.'), 'backend_api_key': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), alias_priority=2, validation_alias='MEX_BACKEND_API_KEY', description='Backend API key with write access to call POST/PUT endpoints'), 'backend_api_parallelization': FieldInfo(annotation=int, required=False, default=1, alias_priority=2, validation_alias='MEX_BACKEND_API_PARALLELIZATION', description='How many simultaneous threads may spin up to load data into the backend.'), 'backend_api_url': FieldInfo(annotation=Url, required=False, default=Url('http://localhost:8080/'), alias_priority=2, validation_alias='MEX_BACKEND_API_URL', description='MEx backend API url.'), 'biospecimen': FieldInfo(annotation=BiospecimenSettings, required=False, default=BiospecimenSettings(raw_data_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/biospecimen"), key_col='Feldname', val_col='zu extrahierender Wert (maschinenlesbar)', mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/biospecimen"))), 'blueant': FieldInfo(annotation=BlueAntSettings, required=False, default=BlueAntSettings(api_key=SecretStr('**********'), url='https://blueant', skip_labels=['test'], delete_prefixes=['_', '1_', '2_', '3_', '4_', '5_', '6_', '7_', '8_', '9_'], mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/blueant"))), 'confluence_vvt': FieldInfo(annotation=ConfluenceVvtSettings, required=False, default=ConfluenceVvtSettings(url='https://confluence.vvt', username=SecretStr('**********'), password=SecretStr('**********'), overview_page_id='123456', template_v1_mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/confluence-vvt_template_v1"), skip_pages=['123456'])), 'consent_mailer': FieldInfo(annotation=ConsentMailerSettings, required=False, default=ConsentMailerSettings(debug=False, sink=[<Sink.NDJSON: 'ndjson'>], assets_dir=PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), work_dir=PosixPath('/home/runner/work/mex-extractors/mex-extractors'), identity_provider=<IdentityProvider.MEMORY: 'memory'>, backend_api_url=Url('http://localhost:8080/'), backend_api_key=SecretStr('**********'), backend_api_parallelization=1, backend_api_chunk_size=25, verify_session=True, organigram_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/organigram/organizational_units.json"), primary_sources_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/primary-sources/primary-sources.json"), ldap_url=SecretStr('**********'), ldap_search_base='DC=rki,DC=local', wiki_api_url=Url('http://wikidata/'), mex_web_user_agent='rki/mex', orcid_api_url=Url('https://orcid/'), mailpit_api_url='localhost:8025', mailpit_api_user=SecretStr('**********'), mailpit_api_password=SecretStr('**********'), schedule=None, smtp_server='localhost:1025', template_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mailings"))), 'contact_point': FieldInfo(annotation=ContactPointSettings, required=False, default=ContactPointSettings(mex_email=Email("mex@rki.de"))), 'datenkompass': FieldInfo(annotation=DatenkompassSettings, required=False, default=DatenkompassSettings(unit_filter='e.g. unit', organization_filter='Organization', cutoff_number_authors=3, list_delimiter='; ')), 'datscha_web': FieldInfo(annotation=DatschaWebSettings, required=False, default=DatschaWebSettings(url='https://datscha/', vorname=SecretStr('**********'), nachname=SecretStr('**********'), pw=SecretStr('**********'), organisation='RKI')), 'debug': FieldInfo(annotation=bool, required=False, default=False, alias='pdb', alias_priority=2, validation_alias='MEX_DEBUG', description='Jump into post-mortem debugging after any uncaught exception.'), 'drop_api_key': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), alias_priority=2, validation_alias='MEX_DROP_API_KEY', description='Drop API key with admin access to call all GET endpoints'), 'drop_api_url': FieldInfo(annotation=Url, required=False, default=Url('http://localhost:8081/'), alias_priority=2, validation_alias='MEX_DROP_API_URL', description='MEx drop API url.'), 'endnote': FieldInfo(annotation=EndnoteSettings, required=False, default=EndnoteSettings(mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/endnote"), cutoff_number_authors=42)), 'ff_projects': FieldInfo(annotation=FFProjectsSettings, required=False, default=FFProjectsSettings(file_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/ff-projects/ff-projects.xlsx"), skip_funding=['Sonstige'], skip_topics=['Sonstige'], skip_years_strings=['fehlt', 'keine', 'offen'], skip_clients=['Sonstige'], mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/ff-projects"))), 'grippeweb': FieldInfo(annotation=GrippewebSettings, required=False, default=GrippewebSettings(mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/grippeweb"), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database')), 'identity_provider': FieldInfo(annotation=IdentityProvider, required=False, default=<IdentityProvider.MEMORY: 'memory'>, alias_priority=2, validation_alias='MEX_IDENTITY_PROVIDER', description='Provider to assign identifiers to new model instances.'), 'ifsg': FieldInfo(annotation=IFSGSettings, required=False, default=IFSGSettings(mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/ifsg"), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database')), 'igs': FieldInfo(annotation=IGSSettings, required=False, default=IGSSettings(url='https://igs', mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/igs"))), 'international_projects': FieldInfo(annotation=InternationalProjectsSettings, required=False, default=InternationalProjectsSettings(file_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/international-projects/international_projects.xlsx"), mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/international-projects"))), 'kerberos_password': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Kerberos password to authenticate against MSSQL server.'), 'kerberos_user': FieldInfo(annotation=str, required=False, default='user@domain.tld', description='Kerberos user to authenticate against MSSQL server.'), 'ldap_search_base': FieldInfo(annotation=str, required=False, default='DC=rki,DC=local', alias_priority=2, validation_alias='MEX_LDAP_SEARCH_BASE', description='Search base for the ldap connector.'), 'ldap_url': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), alias_priority=2, validation_alias='MEX_LDAP_URL', description='LDAP server for person queries with authentication credentials. Must follow format `ldap://user:pw@host:port`, where `user` is the username, and `pw` is the password for authenticating against ldap, `host` is the url of the ldap server, and `port` is the port of the ldap server.'), 'mex_web_user_agent': FieldInfo(annotation=str, required=False, default='rki/mex', alias_priority=2, validation_alias='MEX_WEB_USER_AGENT', description='User agent is sent in request headers to external services.'), 'odk': FieldInfo(annotation=ODKSettings, required=False, default=ODKSettings(raw_data_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/odk"), mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/odk"))), 'open_data': FieldInfo(annotation=OpenDataSettings, required=False, default=OpenDataSettings(url='https://zenodo', community_rki='robertkochinstitut', mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/open-data"))), 'orcid_api_url': FieldInfo(annotation=Url, required=False, default=Url('https://orcid/'), alias_priority=2, validation_alias='MEX_ORCID_API_URL', description='URL of orcid api.'), 'organigram_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/organigram/organizational_units.json"), alias_priority=2, validation_alias='MEX_ORGANIGRAM_PATH', description='Path to the JSON file describing the organizational units, absolute path or relative to `assets_dir`.'), 'primary_sources_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/primary-sources/primary-sources.json"), alias_priority=2, validation_alias='MEX_PRIMARY_SOURCES_PATH', description='Path to the JSON file describing the primary sources, absolute path or relative to `assets_dir`.'), 'publisher': FieldInfo(annotation=PublisherSettings, required=False, default=PublisherSettings(skip_entity_types=['MergedPrimarySource', 'MergedConsent'], allowed_person_primary_sources=['endnote'])), 's3_access_key_id': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='The access key to use when creating the client.'), 's3_bucket_key': FieldInfo(annotation=str, required=False, default='s3_bucket', description='The S3 bucket where to store objects.'), 's3_endpoint_url': FieldInfo(annotation=Url, required=False, default=Url('https://s3/'), description='The complete URL to use for the constructed client.'), 's3_secret_access_key': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='The secret key to use when creating the client.'), 'schedule': FieldInfo(annotation=str, required=False, default='0 0 * * *', alias_priority=2, validation_alias='MEX_SCHEDULE', description='A valid cron string defining when to run extractor jobs'), 'seq_repo': FieldInfo(annotation=SeqRepoSettings, required=False, default=SeqRepoSettings(mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/seq-repo"))), 'sink': FieldInfo(annotation=list[Sink], required=False, default=[<Sink.NDJSON: 'ndjson'>], alias_priority=2, validation_alias='MEX_SINK', description='Where to send data that is extracted or ingested. Defaults to writing ndjson files, but can be configured to push to the backend or the graph.'), 'skip_extractors': FieldInfo(annotation=list[str], required=False, default=[], alias_priority=2, validation_alias='MEX_SKIP_EXTRACTORS', description='Skip execution of these extractors in dagster'), 'sumo': FieldInfo(annotation=SumoSettings, required=False, default=SumoSettings(raw_data_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/sumo"), mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/sumo"))), 'synopse': FieldInfo(annotation=SynopseSettings, required=False, default=SynopseSettings(report_server_url='https://report-server/', report_server_username=SecretStr('**********'), report_server_password=SecretStr('**********'), variablenuebersicht_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/variablenuebersicht.csv"), projekt_und_studienverwaltung_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/projekt_und_studienverwaltung.csv"), metadaten_zu_datensaetzen_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/metadaten_zu_datensaetzen.csv"), datensatzuebersicht_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/raw-data/synopse/datensatzuebersicht.csv"), mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/synopse"))), 'verify_session': FieldInfo(annotation=Union[bool, AssetsPath], required=False, default=True, alias_priority=2, validation_alias='MEX_VERIFY_SESSION', description="Either a boolean that controls whether we verify the server's TLS certificate, or a path to a CA bundle to use. If a path is given, it can be either absolute or relative to the `assets_dir`. Defaults to True."), 'voxco': FieldInfo(annotation=VoxcoSettings, required=False, default=VoxcoSettings(mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/voxco"))), 'wiki_api_url': FieldInfo(annotation=Url, required=False, default=Url('http://wikidata/'), alias_priority=2, validation_alias='MEX_WIKI_API_URL', description='URL of the Wikidata API used to resolve an ID to an organization.'), 'wikidata': FieldInfo(annotation=WikidataSettings, required=False, default=WikidataSettings(mapping_path=AssetsPath("/home/runner/work/mex-extractors/mex-extractors/assets/mappings/wikidata"))), 'work_dir': FieldInfo(annotation=Path, required=False, default=PosixPath('/home/runner/work/mex-extractors/mex-extractors'), alias_priority=2, validation_alias='MEX_WORK_DIR', description='Path to directory that stores generated and temporary files. Defaults to the current working directory.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- odk: ODKSettings¶
- open_data: OpenDataSettings¶
- publisher: PublisherSettings¶
- s3_access_key_id: SecretStr¶
- s3_bucket_key: str¶
- s3_endpoint_url: Url¶
- s3_secret_access_key: SecretStr¶
- schedule: str¶
- seq_repo: SeqRepoSettings¶
- skip_extractors: list[str]¶
- sumo: SumoSettings¶
- synopse: SynopseSettings¶
- voxco: VoxcoSettings¶
- wikidata: WikidataSettings¶
mex.extractors.sorters module¶
- mex.extractors.sorters.topological_sort(items: list[ItemT], primary_key: str, *, parent_key: str | None = None, child_key: str | None = None) None ¶
Sort the given list of items in-place according to their topology.
Items can refer to each other using key fields. A parent item can reference a child item by storing the child’s primary_key in the parent’s child_key field. Similarly, a child can reference its parent using the parent_key field.
This can be useful for submitting items to the backend in the correct order.
mex.extractors.utils module¶
- mex.extractors.utils.ensure_list(values: list[T] | T | None) list[T] ¶
Wrap single objects in lists, replace None with [] and return lists untouched.
- mex.extractors.utils.load_yaml(path: PathLike[str]) dict[str, Any] ¶
Load the contents of a YAML file from the given path and return as a dict.