mex.extractors package¶
Subpackages¶
- mex.extractors.artificial package
- mex.extractors.biospecimen package
- mex.extractors.blueant package
- Subpackages
- Submodules
- mex.extractors.blueant.connector module
BlueAntConnector
BlueAntConnector._get_json_from_api()
BlueAntConnector._set_authentication()
BlueAntConnector._set_url()
BlueAntConnector.get_client_name()
BlueAntConnector.get_department_name()
BlueAntConnector.get_persons()
BlueAntConnector.get_projects()
BlueAntConnector.get_status_name()
BlueAntConnector.get_type_description()
- mex.extractors.blueant.extract module
- mex.extractors.blueant.filter module
- mex.extractors.blueant.main module
- mex.extractors.blueant.settings module
- mex.extractors.blueant.transform module
- Module contents
- mex.extractors.confluence_vvt package
- Submodules
- mex.extractors.confluence_vvt.connector module
- mex.extractors.confluence_vvt.extract module
- mex.extractors.confluence_vvt.main module
- mex.extractors.confluence_vvt.models module
ConfluenceVvtCell
ConfluenceVvtHeading
ConfluenceVvtPage
ConfluenceVvtPage.get_end_year()
ConfluenceVvtPage.get_identifier_in_primary_source()
ConfluenceVvtPage.get_partners()
ConfluenceVvtPage.get_start_year()
ConfluenceVvtPage.get_units()
ConfluenceVvtPage.id
ConfluenceVvtPage.model_computed_fields
ConfluenceVvtPage.model_config
ConfluenceVvtPage.model_fields
ConfluenceVvtPage.tables
ConfluenceVvtPage.title
ConfluenceVvtRow
ConfluenceVvtTable
ConfluenceVvtValue
- mex.extractors.confluence_vvt.parse_html module
- mex.extractors.confluence_vvt.settings module
ConfluenceVvtSettings
ConfluenceVvtSettings.model_computed_fields
ConfluenceVvtSettings.model_config
ConfluenceVvtSettings.model_fields
ConfluenceVvtSettings.overview_page_id
ConfluenceVvtSettings.password
ConfluenceVvtSettings.skip_pages
ConfluenceVvtSettings.template_v1_mapping_path
ConfluenceVvtSettings.url
ConfluenceVvtSettings.username
- mex.extractors.confluence_vvt.transform module
- Module contents
- mex.extractors.datscha_web package
- mex.extractors.ff_projects package
- mex.extractors.grippeweb package
- Submodules
- mex.extractors.grippeweb.connector module
- mex.extractors.grippeweb.extract module
- mex.extractors.grippeweb.main module
- mex.extractors.grippeweb.settings module
- mex.extractors.grippeweb.transform module
get_or_create_external_partner()
transform_grippeweb_access_platform_to_extracted_access_platform()
transform_grippeweb_resource_mappings_to_dict()
transform_grippeweb_resource_mappings_to_extracted_resources()
transform_grippeweb_variable_group_to_extracted_variable_groups()
transform_grippeweb_variable_to_extracted_variables()
- Module contents
- mex.extractors.ifsg package
- Subpackages
- mex.extractors.ifsg.models package
- Submodules
- mex.extractors.ifsg.models.meta_catalogue2item module
- mex.extractors.ifsg.models.meta_catalogue2item2schema module
- mex.extractors.ifsg.models.meta_datatype module
- mex.extractors.ifsg.models.meta_disease module
- mex.extractors.ifsg.models.meta_field module
- mex.extractors.ifsg.models.meta_item module
- mex.extractors.ifsg.models.meta_schema2field module
- mex.extractors.ifsg.models.meta_schema2type module
- mex.extractors.ifsg.models.meta_type module
- Module contents
- mex.extractors.ifsg.models package
- Submodules
- mex.extractors.ifsg.connector module
- mex.extractors.ifsg.extract module
- mex.extractors.ifsg.filter module
- mex.extractors.ifsg.main module
- mex.extractors.ifsg.settings module
- mex.extractors.ifsg.transform module
- Module contents
- Subpackages
- mex.extractors.international_projects package
- mex.extractors.mapping package
- mex.extractors.odk package
- mex.extractors.open_data package
- mex.extractors.organigram package
- mex.extractors.pipeline package
- mex.extractors.primary_source package
- mex.extractors.publisher package
- mex.extractors.rdmo package
- mex.extractors.seq_repo package
- Submodules
- mex.extractors.seq_repo.extract module
- mex.extractors.seq_repo.filter module
- mex.extractors.seq_repo.main module
- mex.extractors.seq_repo.model module
SeqRepoSource
SeqRepoSource.customer_org_unit_id
SeqRepoSource.customer_sample_name
SeqRepoSource.lims_sample_id
SeqRepoSource.model_computed_fields
SeqRepoSource.model_config
SeqRepoSource.model_fields
SeqRepoSource.project_coordinators
SeqRepoSource.project_id
SeqRepoSource.project_name
SeqRepoSource.sequencing_date
SeqRepoSource.sequencing_platform
SeqRepoSource.species
- mex.extractors.seq_repo.settings module
- mex.extractors.seq_repo.transform module
- Module contents
- mex.extractors.sumo package
- Subpackages
- mex.extractors.sumo.models package
- Submodules
- mex.extractors.sumo.models.base module
- mex.extractors.sumo.models.cc1_data_model_nokeda module
- mex.extractors.sumo.models.cc1_data_valuesets module
- mex.extractors.sumo.models.cc2_aux_mapping module
- mex.extractors.sumo.models.cc2_aux_model module
- mex.extractors.sumo.models.cc2_aux_valuesets module
- mex.extractors.sumo.models.cc2_feat_projection module
- Module contents
- mex.extractors.sumo.models package
- Submodules
- mex.extractors.sumo.extract module
- mex.extractors.sumo.filter module
- mex.extractors.sumo.main module
- mex.extractors.sumo.settings module
- mex.extractors.sumo.transform module
create_new_organization_with_official_name()
get_contact_merged_ids_by_emails()
get_contact_merged_ids_by_names()
transform_feat_projection_variable_to_mex_variable()
transform_feat_variable_to_mex_variable_group()
transform_model_nokeda_variable_to_mex_variable_group()
transform_nokeda_aux_variable_to_mex_variable()
transform_nokeda_aux_variable_to_mex_variable_group()
transform_nokeda_model_variable_to_mex_variable()
transform_resource_feat_model_to_mex_resource()
transform_resource_nokeda_to_mex_resource()
transform_sumo_access_platform_to_mex_access_platform()
transform_sumo_activity_to_extracted_activity()
- Module contents
- Subpackages
- mex.extractors.synopse package
- Subpackages
- Submodules
- mex.extractors.synopse.connector module
- mex.extractors.synopse.extract module
- mex.extractors.synopse.filter module
- mex.extractors.synopse.main module
- mex.extractors.synopse.settings module
SynopseSettings
SynopseSettings.datensatzuebersicht_path
SynopseSettings.mapping_path
SynopseSettings.metadaten_zu_datensaetzen_path
SynopseSettings.model_computed_fields
SynopseSettings.model_config
SynopseSettings.model_fields
SynopseSettings.projekt_und_studienverwaltung_path
SynopseSettings.report_server_password
SynopseSettings.report_server_url
SynopseSettings.report_server_username
SynopseSettings.variablenuebersicht_path
- mex.extractors.synopse.transform module
split_off_extended_data_use_variables()
transform_overviews_to_resource_lookup()
transform_synopse_data_extended_data_use_to_mex_resources()
transform_synopse_data_regular_to_mex_resources()
transform_synopse_data_to_mex_resources()
transform_synopse_project_to_activity()
transform_synopse_projects_to_mex_activities()
transform_synopse_studies_into_access_platforms()
transform_synopse_variables_belonging_to_same_variable_group_to_mex_variables()
transform_synopse_variables_to_mex_variable_groups()
transform_synopse_variables_to_mex_variables()
- Module contents
- mex.extractors.voxco package
- mex.extractors.wikidata package
Submodules¶
mex.extractors.drop module¶
- class mex.extractors.drop.DropApiConnector¶
Bases:
HTTPConnector
Connector class to handle interaction with the Drop API.
- API_VERSION = 'v0'¶
- _check_availability() None ¶
Send a GET request to verify the API is available.
- _set_authentication() None ¶
Set the drop API key to all session headers.
- _set_url() None ¶
Set the drop api url with the version path.
- get_file(x_system: str, file_id: str) dict[str, Any] ¶
Get the content of a file from the x_system.
- Parameters:
x_system – name of the x_system
file_id – name of the file
- Returns:
content of the file
- list_files(x_system: str) list[str] ¶
Get available files for the x_system.
- Parameters:
x_system – name of the x_system to list the files for
- Returns:
list of available filenames for the x_system
mex.extractors.filters module¶
- mex.extractors.filters.filter_by_global_rules(primary_source_id: Identifier, items: Iterable[RawDataT]) Generator[RawDataT, None, None] ¶
Filter out items according to global filter rules, build filtered Generator.
- Parameters:
primary_source_id – identifier of the primary source
items – items, source or resource to be filtered
mex.extractors.identity module¶
- class mex.extractors.identity.BackendIdentityProvider¶
Bases:
BaseProvider
,HTTPConnector
Identity provider that communicates with the backend HTTP API.
- API_VERSION = 'v0'¶
- _check_availability() None ¶
Send a GET request to verify the API is available.
- _set_authentication() None ¶
Set the backend API key to all session headers.
- _set_url() None ¶
Set the backend api url with the version path.
- assign(had_primary_source: MergedPrimarySourceIdentifier, identifier_in_primary_source: str) Identity ¶
Find an Identity in a database or assign a new one.
- fetch(*, had_primary_source: Identifier | None = None, identifier_in_primary_source: str | None = None, stable_target_id: Identifier | None = None) list[Identity] ¶
Find Identity instances matching the given filters.
Either provide stableTargetId or hadPrimarySource and identifierInPrimarySource together to get a unique result.
mex.extractors.logging module¶
- mex.extractors.logging.log_filter(identifier_in_primary_source: str | None, primary_source_id: Identifier, reason: str) None ¶
Log filtered sources.
- Parameters:
identifier_in_primary_source – optional identifier in the primary source
primary_source_id – identifier of the primary source
reason – string explaining the reason for filtering
mex.extractors.main module¶
mex.extractors.models module¶
- class mex.extractors.models.BaseRawData¶
Bases:
BaseModel
Raw-data base providing standardized access to attributes for filtering.
- abstract get_end_year() TemporalEntity | None ¶
Return end year from extractor.
- abstract get_identifier_in_primary_source() str | None ¶
Return identifier in primary source from extractor.
- abstract get_partners() Sequence[str | None] ¶
Return partners from extractor.
- abstract get_start_year() TemporalEntity | None ¶
Return start year from extractor.
- abstract get_units() Sequence[str | None] ¶
Return units from extractor.
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
mex.extractors.settings module¶
- class mex.extractors.settings.Settings(_env_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_nested_delimiter: str | None = None, _secrets_dir: str | Path | None = None, *, pdb: bool = False, MEX_SINK: list[Sink] = [Sink.NDJSON], MEX_ASSETS_DIR: Path = PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), MEX_WORK_DIR: Path = PosixPath('/home/runner/work/mex-extractors/mex-extractors'), MEX_IDENTITY_PROVIDER: IdentityProvider | ExtractorIdentityProvider = IdentityProvider.MEMORY, MEX_BACKEND_API_URL: Url = Url('http://localhost:8080/'), MEX_BACKEND_API_KEY: SecretStr = SecretStr('**********'), MEX_VERIFY_SESSION: bool | AssetsPath = True, MEX_ORGANIGRAM_PATH: AssetsPath = AssetsPath('raw-data/organigram/organizational_units.json'), MEX_PRIMARY_SOURCES_PATH: AssetsPath = AssetsPath('raw-data/primary-sources/primary-sources.json'), MEX_LDAP_URL: SecretStr = SecretStr('**********'), MEX_WIKI_API_URL: Url = Url('https://wikidata/'), MEX_WIKI_QUERY_SERVICE_URL: Url = Url('https://wikidata/'), MEX_WEB_USER_AGENT: str = 'rki/mex', MEX_SKIP_EXTRACTORS: list[str] = [], MEX_SKIP_MERGED_ITEMS: list[str] = ['MergedPrimarySource', 'MergedConsent'], MEX_SKIP_PARTNERS: list[str] = ['test'], MEX_SKIP_UNITS: list[str] = ['IT', 'PRAES', 'ZV'], MEX_SKIP_YEARS_BEFORE: int = 1970, MEX_DROP_API_KEY: SecretStr = SecretStr('**********'), MEX_DROP_API_URL: Url = Url('http://localhost:8081/'), MEX_SCHEDULE: str = '0 0 * * *', kerberos_user: str = 'user@domain.tld', kerberos_password: SecretStr = SecretStr('**********'), artificial: ArtificialSettings = ArtificialSettings(count=100, chattiness=10, seed=0, locale=['de_DE', 'en_US'], mesh_file=AssetsPath('raw-data/artificial/asciimesh.bin')), biospecimen: BiospecimenSettings = BiospecimenSettings(raw_data_path=AssetsPath('raw-data/biospecimen'), key_col='Feldname', val_col='zu extrahierender Wert (maschinenlesbar)', mapping_path=AssetsPath('mappings/__final__/biospecimen')), blueant: BlueAntSettings = BlueAntSettings(api_key=SecretStr('**********'), url='https://blueant', skip_labels=['test'], delete_prefixes=['_', '1_', '2_', '3_', '4_', '5_', '6_', '7_', '8_', '9_'], mapping_path=AssetsPath('mappings/__final__/blueant')), confluence_vvt: ConfluenceVvtSettings = ConfluenceVvtSettings(url='https://confluence.vvt', username=SecretStr('**********'), password=SecretStr('**********'), overview_page_id='123456', template_v1_mapping_path=AssetsPath('mappings/__final__/confluence-vvt_template_v1'), skip_pages=['123456']), datscha_web: DatschaWebSettings = DatschaWebSettings(url='https://datscha/', vorname=SecretStr('**********'), nachname=SecretStr('**********'), pw=SecretStr('**********'), organisation='RKI'), ff_projects: FFProjectsSettings = FFProjectsSettings(file_path=AssetsPath('raw-data/ff-projects/ff-projects.xlsx'), skip_funding=['Sonstige'], skip_topics=['Sonstige'], skip_years_strings=['fehlt', 'keine', 'offen'], skip_clients=['Sonstige'], mapping_path=AssetsPath('mappings/__final__/ff-projects')), grippeweb: GrippewebSettings = GrippewebSettings(mapping_path=AssetsPath('mappings/__final__/grippeweb'), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database'), ifsg: IFSGSettings = IFSGSettings(mapping_path=AssetsPath('mappings/__final__/ifsg'), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database'), international_projects: InternationalProjectsSettings = InternationalProjectsSettings(file_path=AssetsPath('raw-data/international-projects/international_projects.xlsx'), mapping_path=AssetsPath('mappings/__final__/international-projects')), odk: ODKSettings = ODKSettings(raw_data_path=AssetsPath('raw-data/odk'), mapping_path=AssetsPath('mappings/__final__/odk')), open_data: OpenDataSettings = OpenDataSettings(url='https://zenodo', community_rki='robertkochinstitut'), rdmo: RDMOSettings = RDMOSettings(url='https://rdmo/', username=SecretStr('**********'), password=SecretStr('**********')), seq_repo: SeqRepoSettings = SeqRepoSettings(mapping_path=AssetsPath('mappings/__final__/seq-repo')), sumo: SumoSettings = SumoSettings(raw_data_path=AssetsPath('raw-data/sumo'), mapping_path=AssetsPath('mappings/__final__/sumo')), voxco: VoxcoSettings = VoxcoSettings(mapping_path=AssetsPath('mappings/__final__/voxco')), synopse: SynopseSettings = SynopseSettings(report_server_url='https://report-server/', report_server_username=SecretStr('**********'), report_server_password=SecretStr('**********'), variablenuebersicht_path=AssetsPath('raw-data/synopse/variablenuebersicht.csv'), projekt_und_studienverwaltung_path=AssetsPath('raw-data/synopse/projekt_und_studienverwaltung.csv'), metadaten_zu_datensaetzen_path=AssetsPath('raw-data/synopse/metadaten_zu_datensaetzen.csv'), datensatzuebersicht_path=AssetsPath('raw-data/synopse/datensatzuebersicht.csv'), mapping_path=AssetsPath('mappings/__final__/synopse')))¶
Bases:
BaseSettings
Settings definition class for extractors and related scripts.
- artificial: ArtificialSettings¶
- biospecimen: BiospecimenSettings¶
- blueant: BlueAntSettings¶
- confluence_vvt: ConfluenceVvtSettings¶
- datscha_web: DatschaWebSettings¶
- drop_api_key: SecretStr¶
- drop_api_url: Url¶
- ff_projects: FFProjectsSettings¶
- grippeweb: GrippewebSettings¶
- identity_provider: IdentityProvider | ExtractorIdentityProvider¶
- ifsg: IFSGSettings¶
- international_projects: InternationalProjectsSettings¶
- kerberos_password: SecretStr¶
- kerberos_user: str¶
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[SettingsConfigDict] = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': False, 'env_nested_delimiter': '__', 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'mex_', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'populate_by_name': True, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_assignment': True, 'validate_default': True, 'yaml_file': None, 'yaml_file_encoding': None}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'artificial': FieldInfo(annotation=ArtificialSettings, required=False, default=ArtificialSettings(count=100, chattiness=10, seed=0, locale=['de_DE', 'en_US'], mesh_file=AssetsPath("raw-data/artificial/asciimesh.bin"))), 'assets_dir': FieldInfo(annotation=Path, required=False, default=PosixPath('/home/runner/work/mex-extractors/mex-extractors/assets'), alias_priority=2, validation_alias='MEX_ASSETS_DIR', description='Path to directory that contains input files treated as read-only, looks for a folder named `assets` in the current directory by default.'), 'backend_api_key': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), alias_priority=2, validation_alias='MEX_BACKEND_API_KEY', description='Backend API key with write access to call POST/PUT endpoints'), 'backend_api_url': FieldInfo(annotation=Url, required=False, default=Url('http://localhost:8080/'), alias_priority=2, validation_alias='MEX_BACKEND_API_URL', description='MEx backend API url.'), 'biospecimen': FieldInfo(annotation=BiospecimenSettings, required=False, default=BiospecimenSettings(raw_data_path=AssetsPath("raw-data/biospecimen"), key_col='Feldname', val_col='zu extrahierender Wert (maschinenlesbar)', mapping_path=AssetsPath("mappings/__final__/biospecimen"))), 'blueant': FieldInfo(annotation=BlueAntSettings, required=False, default=BlueAntSettings(api_key=SecretStr('**********'), url='https://blueant', skip_labels=['test'], delete_prefixes=['_', '1_', '2_', '3_', '4_', '5_', '6_', '7_', '8_', '9_'], mapping_path=AssetsPath("mappings/__final__/blueant"))), 'confluence_vvt': FieldInfo(annotation=ConfluenceVvtSettings, required=False, default=ConfluenceVvtSettings(url='https://confluence.vvt', username=SecretStr('**********'), password=SecretStr('**********'), overview_page_id='123456', template_v1_mapping_path=AssetsPath("mappings/__final__/confluence-vvt_template_v1"), skip_pages=['123456'])), 'datscha_web': FieldInfo(annotation=DatschaWebSettings, required=False, default=DatschaWebSettings(url='https://datscha/', vorname=SecretStr('**********'), nachname=SecretStr('**********'), pw=SecretStr('**********'), organisation='RKI')), 'debug': FieldInfo(annotation=bool, required=False, default=False, alias='pdb', alias_priority=2, validation_alias='MEX_DEBUG', description='Jump into post-mortem debugging after any uncaught exception.'), 'drop_api_key': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), alias_priority=2, validation_alias='MEX_DROP_API_KEY', description='Drop API key with admin access to call all GET endpoints'), 'drop_api_url': FieldInfo(annotation=Url, required=False, default=Url('http://localhost:8081/'), alias_priority=2, validation_alias='MEX_DROP_API_URL', description='MEx drop API url.'), 'ff_projects': FieldInfo(annotation=FFProjectsSettings, required=False, default=FFProjectsSettings(file_path=AssetsPath("raw-data/ff-projects/ff-projects.xlsx"), skip_funding=['Sonstige'], skip_topics=['Sonstige'], skip_years_strings=['fehlt', 'keine', 'offen'], skip_clients=['Sonstige'], mapping_path=AssetsPath("mappings/__final__/ff-projects"))), 'grippeweb': FieldInfo(annotation=GrippewebSettings, required=False, default=GrippewebSettings(mapping_path=AssetsPath("mappings/__final__/grippeweb"), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database')), 'identity_provider': FieldInfo(annotation=Union[IdentityProvider, ExtractorIdentityProvider], required=False, default=<IdentityProvider.MEMORY: 'memory'>, alias_priority=2, validation_alias='MEX_IDENTITY_PROVIDER', description='Provider to assign stableTargetIds to new model instances.'), 'ifsg': FieldInfo(annotation=IFSGSettings, required=False, default=IFSGSettings(mapping_path=AssetsPath("mappings/__final__/ifsg"), mssql_connection_dsn='DRIVER={ODBC Driver 18 for SQL Server};SERVER=domain.tld;DATABASE=database')), 'international_projects': FieldInfo(annotation=InternationalProjectsSettings, required=False, default=InternationalProjectsSettings(file_path=AssetsPath("raw-data/international-projects/international_projects.xlsx"), mapping_path=AssetsPath("mappings/__final__/international-projects"))), 'kerberos_password': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Kerberos password to authenticate against MSSQL server.'), 'kerberos_user': FieldInfo(annotation=str, required=False, default='user@domain.tld', description='Kerberos user to authenticate against MSSQL server.'), 'ldap_url': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), alias_priority=2, validation_alias='MEX_LDAP_URL', description='LDAP server for person queries with authentication credentials. Must follow format `ldap://user:pw@host:port`, where `user` is the username, and `pw` is the password for authenticating against ldap, `host` is the url of the ldap server, and `port` is the port of the ldap server.'), 'mex_web_user_agent': FieldInfo(annotation=str, required=False, default='rki/mex', alias_priority=2, validation_alias='MEX_WEB_USER_AGENT', description='a user agent is sent in the header of some requests to external services '), 'odk': FieldInfo(annotation=ODKSettings, required=False, default=ODKSettings(raw_data_path=AssetsPath("raw-data/odk"), mapping_path=AssetsPath("mappings/__final__/odk"))), 'open_data': FieldInfo(annotation=OpenDataSettings, required=False, default=OpenDataSettings(url='https://zenodo', community_rki='robertkochinstitut')), 'organigram_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/organigram/organizational_units.json"), alias_priority=2, validation_alias='MEX_ORGANIGRAM_PATH', description='Path to the JSON file describing the organizational units, absolute path or relative to `assets_dir`.'), 'primary_sources_path': FieldInfo(annotation=AssetsPath, required=False, default=AssetsPath("raw-data/primary-sources/primary-sources.json"), alias_priority=2, validation_alias='MEX_PRIMARY_SOURCES_PATH', description='Path to the JSON file describing the primary sources, absolute path or relative to `assets_dir`.'), 'rdmo': FieldInfo(annotation=RDMOSettings, required=False, default=RDMOSettings(url='https://rdmo/', username=SecretStr('**********'), password=SecretStr('**********'))), 'schedule': FieldInfo(annotation=str, required=False, default='0 0 * * *', alias_priority=2, validation_alias='MEX_SCHEDULE', description='A valid cron string defining when to run extractor jobs'), 'seq_repo': FieldInfo(annotation=SeqRepoSettings, required=False, default=SeqRepoSettings(mapping_path=AssetsPath("mappings/__final__/seq-repo"))), 'sink': FieldInfo(annotation=list[Sink], required=False, default=[<Sink.NDJSON: 'ndjson'>], alias_priority=2, validation_alias='MEX_SINK', description='Where to send data that is extracted or ingested. Defaults to writing ndjson files, but can be configured to push to the backend or the graph.'), 'skip_extractors': FieldInfo(annotation=list[str], required=False, default=[], alias_priority=2, validation_alias='MEX_SKIP_EXTRACTORS', description='Skip execution of these extractors in dagster'), 'skip_merged_items': FieldInfo(annotation=list[str], required=False, default=['MergedPrimarySource', 'MergedConsent'], alias_priority=2, validation_alias='MEX_SKIP_MERGED_ITEMS', description='Skip merged items with these types'), 'skip_partners': FieldInfo(annotation=list[str], required=False, default=['test'], alias_priority=2, validation_alias='MEX_SKIP_PARTNERS', description='Skip projects with these external partners'), 'skip_units': FieldInfo(annotation=list[str], required=False, default=['IT', 'PRAES', 'ZV'], alias_priority=2, validation_alias='MEX_SKIP_UNITS', description='Skip projects with these responsible units'), 'skip_years_before': FieldInfo(annotation=int, required=False, default=1970, alias_priority=2, validation_alias='MEX_SKIP_YEARS_BEFORE', description='Skip projects conducted before this year'), 'sumo': FieldInfo(annotation=SumoSettings, required=False, default=SumoSettings(raw_data_path=AssetsPath("raw-data/sumo"), mapping_path=AssetsPath("mappings/__final__/sumo"))), 'synopse': FieldInfo(annotation=SynopseSettings, required=False, default=SynopseSettings(report_server_url='https://report-server/', report_server_username=SecretStr('**********'), report_server_password=SecretStr('**********'), variablenuebersicht_path=AssetsPath("raw-data/synopse/variablenuebersicht.csv"), projekt_und_studienverwaltung_path=AssetsPath("raw-data/synopse/projekt_und_studienverwaltung.csv"), metadaten_zu_datensaetzen_path=AssetsPath("raw-data/synopse/metadaten_zu_datensaetzen.csv"), datensatzuebersicht_path=AssetsPath("raw-data/synopse/datensatzuebersicht.csv"), mapping_path=AssetsPath("mappings/__final__/synopse"))), 'verify_session': FieldInfo(annotation=Union[bool, AssetsPath], required=False, default=True, alias_priority=2, validation_alias='MEX_VERIFY_SESSION', description="Either a boolean that controls whether we verify the server's TLS certificate, or a path to a CA bundle to use. If a path is given, it can be either absolute or relative to the `assets_dir`. Defaults to True."), 'voxco': FieldInfo(annotation=VoxcoSettings, required=False, default=VoxcoSettings(mapping_path=AssetsPath("mappings/__final__/voxco"))), 'wiki_api_url': FieldInfo(annotation=Url, required=False, default=Url('https://wikidata/'), alias_priority=2, validation_alias='MEX_WIKI_API_URL', description='URL of Wikidata API, this URL is used to send wikidata organization ID to get all the info about the organization, which includes basic info, aliases, labels, descriptions, claims, and sitelinks'), 'wiki_query_service_url': FieldInfo(annotation=Url, required=False, default=Url('https://wikidata/'), alias_priority=2, validation_alias='MEX_WIKI_QUERY_SERVICE_URL', description='URL of Wikidata query service, this URL is to send organization name in plain text to wikidata and receive search results with wikidata organization ID'), 'work_dir': FieldInfo(annotation=Path, required=False, default=PosixPath('/home/runner/work/mex-extractors/mex-extractors'), alias_priority=2, validation_alias='MEX_WORK_DIR', description='Path to directory that stores generated and temporary files. Defaults to the current working directory.')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- odk: ODKSettings¶
- open_data: OpenDataSettings¶
- rdmo: RDMOSettings¶
- schedule: str¶
- seq_repo: SeqRepoSettings¶
- skip_extractors: list[str]¶
- skip_merged_items: list[str]¶
- skip_partners: list[str]¶
- skip_units: list[str]¶
- skip_years_before: int¶
- sumo: SumoSettings¶
- synopse: SynopseSettings¶
- voxco: VoxcoSettings¶
mex.extractors.sinks module¶
- mex.extractors.sinks.load(models: Iterable[ExtractedAccessPlatform | ExtractedActivity | ExtractedBibliographicResource | ExtractedConsent | ExtractedContactPoint | ExtractedDistribution | ExtractedOrganization | ExtractedOrganizationalUnit | ExtractedPerson | ExtractedPrimarySource | ExtractedResource | ExtractedVariable | ExtractedVariableGroup]) None ¶
Load models to the backend API or write to NDJSON files.
- Parameters:
models – Iterable of extracted models
- Settings:
sink: Where to load the provided models