mex.extractors.confluence_vvt package¶
Subpackages¶
- mex.extractors.confluence_vvt.models package
- Submodules
- mex.extractors.confluence_vvt.models.source module
ConfluenceVvtSource
ConfluenceVvtSource.abstract
ConfluenceVvtSource.activity_type
ConfluenceVvtSource.alternative_title
ConfluenceVvtSource.contact
ConfluenceVvtSource.documentation
ConfluenceVvtSource.end
ConfluenceVvtSource.funder_or_commissioner
ConfluenceVvtSource.funding_program
ConfluenceVvtSource.gemeinsam_verantwortliche
ConfluenceVvtSource.get_end_year()
ConfluenceVvtSource.get_identifier_in_primary_source()
ConfluenceVvtSource.get_partners()
ConfluenceVvtSource.get_start_year()
ConfluenceVvtSource.get_units()
ConfluenceVvtSource.identifier
ConfluenceVvtSource.identifier_in_primary_source
ConfluenceVvtSource.involved_person
ConfluenceVvtSource.involved_unit
ConfluenceVvtSource.is_part_of_activity
ConfluenceVvtSource.model_computed_fields
ConfluenceVvtSource.model_config
ConfluenceVvtSource.model_fields
ConfluenceVvtSource.publication
ConfluenceVvtSource.responsible_unit
ConfluenceVvtSource.short_name
ConfluenceVvtSource.start
ConfluenceVvtSource.succeeds
ConfluenceVvtSource.theme
ConfluenceVvtSource.title
ConfluenceVvtSource.website
- Module contents
Submodules¶
mex.extractors.confluence_vvt.connector module¶
mex.extractors.confluence_vvt.extract module¶
- mex.extractors.confluence_vvt.extract.extract_confluence_vvt_authors(confluence_vvt_sources: Iterable[ConfluenceVvtSource]) Generator[LDAPPersonWithQuery, None, None] ¶
Extract LDAP persons with their query string for confluence-vvt authors.
- Parameters:
confluence_vvt_sources – confluence-vvt sources
- Returns:
Generator for LDAP persons with query
- mex.extractors.confluence_vvt.extract.fetch_all_data_page_ids() Generator[str, None, None] ¶
Fetch all the ids for data pages.
- Settings:
confluence_vvt.url: Confluence-vvt base url confluence_vvt.overview_page_id: page id of the overview page
- Raises:
MExError – When the pagination limit is exceeded
- Returns:
Generator for page IDs
- mex.extractors.confluence_vvt.extract.fetch_all_pages_data(page_ids: Iterable[str]) Generator[ConfluenceVvtSource, None, None] ¶
Fetch data from data pages.
- Parameters:
page_ids – Iterable of ids of the pages to extract data from
- Settings:
url: Confluence base url
- Returns:
Generator for ConfluenceVvtSource items
mex.extractors.confluence_vvt.main module¶
mex.extractors.confluence_vvt.parse_html module¶
- mex.extractors.confluence_vvt.parse_html.get_clean_current_row_all_cols_data(current_row_all_cols_data: list[str]) list[str] ¶
Get clean data for all cols in current row, removing all unwanted characters.
- Parameters:
current_row_all_cols_data – List of all columns of current row
- Returns:
list of cleaned strings for all columns of current row
- mex.extractors.confluence_vvt.parse_html.get_interne_vorgangsnummer_from_all_rows_data(intnmr_dict: Any | None | list[str]) list[str] | Any ¶
Get Interne Vorgangsnummer from the table extracted data.
- Parameters:
intnmr_dict – Extracted dict or list of Interne Vorgangsnummer
- Returns:
list of extracted Interne Vorgangsnummer
- mex.extractors.confluence_vvt.parse_html.get_interne_vorgangsnummer_from_title(interne_vorgangsnummer_title: str) list[str] ¶
Extract Interne Vorgangsnummer from the title row.
- Parameters:
interne_vorgangsnummer_title – Interne Vorgangsnummer title
- Returns:
list of extracted Interne Vorgangsnummer from the title
- mex.extractors.confluence_vvt.parse_html.get_row_data_for_all_rows(table_rows: ResultSet[Any], min_ignorable_cols: int = 1) dict[str, str | list[str]] ¶
Get all the data from the provided rows.
- Parameters:
table_rows – Table rows ResultSet from bs4
min_ignorable_cols – If row has multiple columns, number of columns below this number will be ignored. Defaults to 1.
- Returns:
structured dict of all the extracted data
- mex.extractors.confluence_vvt.parse_html.get_verantwortlichen(field_name: str, all_rows_data: dict[str, str | list[str]]) tuple[list[str], list[str]] ¶
Get verantworlichen from the extracted all rows data.
- Parameters:
field_name – Name of the field in the all_rows_data thats is to be extracted
all_rows_data – All extracted rows data
- Returns:
tuple of names and oes of verantworlicher(in)
- mex.extractors.confluence_vvt.parse_html.parse_data_html_page(html: str) tuple[str | list[str] | None, list[str], list[str], list[str], list[str], list[str], list[str], list[str] | Any] | None ¶
Parse required data from html string.
- Parameters:
html – Raw html in string format
- Returns:
abstract, verantwortliche_studienleiterin, OE names and interne_vorgangsnummer
mex.extractors.confluence_vvt.settings module¶
- class mex.extractors.confluence_vvt.settings.ConfluenceVvtSettings(*, url: str = 'https://confluence.vvt', username: SecretStr = SecretStr('**********'), password: SecretStr = SecretStr('**********'), overview_page_id: str = '123456')¶
Bases:
BaseModel
Confluence-vvt settings submodule definition for the Confluence-vvt extractor.
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'overview_page_id': FieldInfo(annotation=str, required=False, default='123456', description='Confluence id of the overview page.'), 'password': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Confluence-vvt password'), 'url': FieldInfo(annotation=str, required=False, default='https://confluence.vvt', description='URL of Confluence-vvt.'), 'username': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Confluence-vvt user name')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- overview_page_id: str¶
- password: SecretStr¶
- url: str¶
- username: SecretStr¶
mex.extractors.confluence_vvt.transform module¶
- mex.extractors.confluence_vvt.transform.transform_confluence_vvt_sources_to_extracted_activities(confluence_vvt_sources: Iterable[ConfluenceVvtSource], extracted_primary_source: ExtractedPrimarySource, merged_ids_by_query_string: dict[Hashable, list[Identifier]], unit_merged_ids_by_synonym: dict[str, Identifier]) Generator[ExtractedActivity, None, None] ¶
Transform Confluence-vvt sources to extracted activities.
- Parameters:
confluence_vvt_sources – Confluence-vvt sources
extracted_primary_source – Extracted primary source for Confluence-vvt
merged_ids_by_query_string – Mapping from author query to merged IDs
unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID
- Returns:
Generator for ExtractedActivity