mex.extractors.confluence_vvt package

Subpackages

Submodules

mex.extractors.confluence_vvt.connector module

class mex.extractors.confluence_vvt.connector.ConfluenceVvtConnector

Bases: HTTPConnector

Connector class to create a session for all requests to confluence-vvt.

_set_authentication() None

Authenticate to the host.

_set_url() None

Set url of the host.

mex.extractors.confluence_vvt.extract module

mex.extractors.confluence_vvt.extract.extract_confluence_vvt_authors(confluence_vvt_sources: Iterable[ConfluenceVvtSource]) Generator[LDAPPersonWithQuery, None, None]

Extract LDAP persons with their query string for confluence-vvt authors.

Parameters:

confluence_vvt_sources – confluence-vvt sources

Returns:

Generator for LDAP persons with query

mex.extractors.confluence_vvt.extract.fetch_all_data_page_ids() Generator[str, None, None]

Fetch all the ids for data pages.

Settings:

confluence_vvt.url: Confluence-vvt base url confluence_vvt.overview_page_id: page id of the overview page

Raises:

MExError – When the pagination limit is exceeded

Returns:

Generator for page IDs

mex.extractors.confluence_vvt.extract.fetch_all_pages_data(page_ids: Iterable[str]) Generator[ConfluenceVvtSource, None, None]

Fetch data from data pages.

Parameters:

page_ids – Iterable of ids of the pages to extract data from

Settings:

url: Confluence base url

Returns:

Generator for ConfluenceVvtSource items

mex.extractors.confluence_vvt.main module

mex.extractors.confluence_vvt.parse_html module

mex.extractors.confluence_vvt.parse_html.get_clean_current_row_all_cols_data(current_row_all_cols_data: list[str]) list[str]

Get clean data for all cols in current row, removing all unwanted characters.

Parameters:

current_row_all_cols_data – List of all columns of current row

Returns:

list of cleaned strings for all columns of current row

mex.extractors.confluence_vvt.parse_html.get_interne_vorgangsnummer_from_all_rows_data(intnmr_dict: Any | None | list[str]) list[str] | Any

Get Interne Vorgangsnummer from the table extracted data.

Parameters:

intnmr_dict – Extracted dict or list of Interne Vorgangsnummer

Returns:

list of extracted Interne Vorgangsnummer

mex.extractors.confluence_vvt.parse_html.get_interne_vorgangsnummer_from_title(interne_vorgangsnummer_title: str) list[str]

Extract Interne Vorgangsnummer from the title row.

Parameters:

interne_vorgangsnummer_title – Interne Vorgangsnummer title

Returns:

list of extracted Interne Vorgangsnummer from the title

mex.extractors.confluence_vvt.parse_html.get_row_data_for_all_rows(table_rows: ResultSet[Any], min_ignorable_cols: int = 1) dict[str, str | list[str]]

Get all the data from the provided rows.

Parameters:
  • table_rows – Table rows ResultSet from bs4

  • min_ignorable_cols – If row has multiple columns, number of columns below this number will be ignored. Defaults to 1.

Returns:

structured dict of all the extracted data

mex.extractors.confluence_vvt.parse_html.get_verantwortlichen(field_name: str, all_rows_data: dict[str, str | list[str]]) tuple[list[str], list[str]]

Get verantworlichen from the extracted all rows data.

Parameters:
  • field_name – Name of the field in the all_rows_data thats is to be extracted

  • all_rows_data – All extracted rows data

Returns:

tuple of names and oes of verantworlicher(in)

mex.extractors.confluence_vvt.parse_html.parse_data_html_page(html: str) tuple[str | list[str] | None, list[str], list[str], list[str], list[str], list[str], list[str], list[str] | Any] | None

Parse required data from html string.

Parameters:

html – Raw html in string format

Returns:

abstract, verantwortliche_studienleiterin, OE names and interne_vorgangsnummer

mex.extractors.confluence_vvt.settings module

class mex.extractors.confluence_vvt.settings.ConfluenceVvtSettings(*, url: str = 'https://confluence.vvt', username: SecretStr = SecretStr('**********'), password: SecretStr = SecretStr('**********'), overview_page_id: str = '123456')

Bases: BaseModel

Confluence-vvt settings submodule definition for the Confluence-vvt extractor.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'overview_page_id': FieldInfo(annotation=str, required=False, default='123456', description='Confluence id of the overview page.'), 'password': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Confluence-vvt password'), 'url': FieldInfo(annotation=str, required=False, default='https://confluence.vvt', description='URL of Confluence-vvt.'), 'username': FieldInfo(annotation=SecretStr, required=False, default=SecretStr('**********'), description='Confluence-vvt user name')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

overview_page_id: str
password: SecretStr
url: str
username: SecretStr

mex.extractors.confluence_vvt.transform module

mex.extractors.confluence_vvt.transform.transform_confluence_vvt_sources_to_extracted_activities(confluence_vvt_sources: Iterable[ConfluenceVvtSource], extracted_primary_source: ExtractedPrimarySource, merged_ids_by_query_string: dict[Hashable, list[Identifier]], unit_merged_ids_by_synonym: dict[str, Identifier]) Generator[ExtractedActivity, None, None]

Transform Confluence-vvt sources to extracted activities.

Parameters:
  • confluence_vvt_sources – Confluence-vvt sources

  • extracted_primary_source – Extracted primary source for Confluence-vvt

  • merged_ids_by_query_string – Mapping from author query to merged IDs

  • unit_merged_ids_by_synonym – Map from unit acronyms and labels to their merged ID

Returns:

Generator for ExtractedActivity

Module contents