mex.common.wikidata package¶
Submodules¶
mex.common.wikidata.connector module¶
- class mex.common.wikidata.connector.WikidataAPIConnector¶
Bases:
HTTPConnectorConnector class to handle requesting the Wikidata API.
- _check_availability() None¶
Send a GET request to verify the host is available.
- _set_session() None¶
Create and set request session.
- _set_url() None¶
Set url of the host.
- get_wikidata_item_details_by_id(item_id: str) dict[str, str]¶
Get details of a wikidata item by item id.
- Parameters:
item_id – wikidata item id
- Returns:
Details of the found item.
mex.common.wikidata.extract module¶
- mex.common.wikidata.extract.get_wikidata_organization(item_id_or_url: str) WikidataOrganization¶
Get a wikidata item details by its ID.
- Parameters:
item_id_or_url – Wikidata item ID or full URL
- Raises:
ValueError – when item_id_or_url does not match pattern
- Returns:
WikidataOrganization object
mex.common.wikidata.models module¶
- class mex.common.wikidata.models.Alias(*, language: str, value: str)¶
Bases:
BaseModelModel class for single alias.
- language: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: str¶
- class mex.common.wikidata.models.Aliases(*, de: list[Alias] = [], en: list[Alias] = [])¶
Bases:
BaseModelModel class for aliases.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mex.common.wikidata.models.Claim(*, mainsnak: Mainsnak)¶
Bases:
BaseModelModel class a Claim.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mex.common.wikidata.models.Claims(*, P856: list[Claim] = [], P213: list[Claim] = [], P6782: list[Claim] = [], P1448: list[Claim] = [], P1813: list[Claim] = [], P1705: list[Claim] = [], P4871: list[Claim] = [], P227: list[Claim] = [], P214: list[Claim] = [])¶
Bases:
BaseModelmodel class for Claims.
- gepris_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P4871', alias_priority=2)]¶
- gnd_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P227', alias_priority=2)]¶
- isni_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P213', alias_priority=2)]¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- native_label: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P1705', alias_priority=2)]¶
- official_name: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P1448', alias_priority=2)]¶
- ror_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P6782', alias_priority=2)]¶
- short_name: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P1813', alias_priority=2)]¶
- class mex.common.wikidata.models.DataValue(*, value: Value)¶
Bases:
BaseModelModel class for Data Values (for claims).
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod transform_strings_to_dict(values: dict[str, str | dict[str, str]]) dict[str, dict[str, str | None]] | dict[str, str | dict[str, str]]¶
Transform string and null value to a dict for parsing.
- Parameters:
values – values that needs to be parsed
- Returns:
resulting dict
- class mex.common.wikidata.models.Label(*, language: str | None = None, value: str)¶
Bases:
BaseModelModel class for single Label.
- language: str | None¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: str¶
- class mex.common.wikidata.models.Labels(*, de: Label | None = None, en: Label | None = None, mul: Label | None = None)¶
Bases:
BaseModelModel class for Labels.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mex.common.wikidata.models.Mainsnak(*, datavalue: DataValue)¶
Bases:
BaseModelModel class for Mainsnack (for claims).
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mex.common.wikidata.models.Value(*, text: str | None = None, language: str | None = None)¶
Bases:
BaseModelModel class for Values (for claims).
- language: str | None¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- text: str | None¶
- class mex.common.wikidata.models.WikidataOrganization(*, id: str, labels: Labels, claims: Claims, aliases: Aliases)¶
Bases:
BaseModelModel class for Wikidata sources.
- identifier: Annotated[str, FieldInfo(annotation=NoneType, required=True, alias='id', alias_priority=2)]¶
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
mex.common.wikidata.transform module¶
- mex.common.wikidata.transform._get_alternative_names(native_labels: Sequence[Claim], all_aliases: Aliases) list[Text]¶
Get alternative names of an organization in DE and EN.
- Parameters:
native_labels – Sequence of all native labels
all_aliases – All aliases of the organization
- Returns:
combined list of native labels and aliases in DE and EN
- mex.common.wikidata.transform._get_clean_short_names(short_names: Sequence[Claim]) list[Text]¶
Get clean short names only in EN and DE and ignore the rest.
- Parameters:
short_names – List of all short names
- Returns:
list of clean short names in EN and DE
- mex.common.wikidata.transform.get_official_name_label(labels: Labels) Text | None¶
Get if DE label is available and return a list of EN and DE labels.
- Parameters:
labels – Wikidata labels object
- Returns:
Text object of the label that was picked, or None
- mex.common.wikidata.transform.transform_wikidata_organization_to_extracted_organization(wikidata_organization: WikidataOrganization, wikidata_primary_source_id: MergedPrimarySourceIdentifier) ExtractedOrganization | None¶
Transform one wikidata organization into ExtractedOrganizations.
If no labels are found on the wikidata organization, None is returned instead.
- Parameters:
wikidata_organization – wikidata organization to be transformed
wikidata_primary_source_id – Extracted primary source id for wikidata
- Returns:
ExtractedOrganization or None
- mex.common.wikidata.transform.transform_wikidata_organizations_to_extracted_organizations(wikidata_organizations: Iterable[WikidataOrganization], wikidata_primary_source_id: MergedPrimarySourceIdentifier) Generator[ExtractedOrganization, None, None]¶
Transform wikidata organizations into ExtractedOrganizations.
Wikidata organizations without labels are skipped.
- Parameters:
wikidata_organizations – Iterable of wikidata organization to be transformed
wikidata_primary_source_id – Extracted primary source id for wikidata
- Returns:
Generator of ExtractedOrganizations