mex.common.wikidata package

Submodules

mex.common.wikidata.connector module

class mex.common.wikidata.connector.WikidataAPIConnector

Bases: HTTPConnector

Connector class to handle requesting the Wikidata API.

_check_availability() None

Send a GET request to verify the host is available.

_set_session() None

Create and set request session.

_set_url() None

Set url of the host.

get_wikidata_item_details_by_id(item_id: str) dict[str, str]

Get details of a wikidata item by item id.

Parameters:

item_id – wikidata item id

Returns:

Details of the found item.

mex.common.wikidata.extract module

mex.common.wikidata.extract.get_wikidata_organization(item_id_or_url: str) WikidataOrganization

Get a wikidata item details by its ID.

Parameters:

item_id_or_url – Wikidata item ID or full URL

Raises:

ValueError – when item_id_or_url does not match pattern

Returns:

WikidataOrganization object

mex.common.wikidata.models module

class mex.common.wikidata.models.Alias(*, language: str, value: str)

Bases: BaseModel

Model class for single alias.

language: str
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: str
class mex.common.wikidata.models.Aliases(*, de: list[Alias] = [], en: list[Alias] = [])

Bases: BaseModel

Model class for aliases.

de: list[Alias]
en: list[Alias]
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class mex.common.wikidata.models.Claim(*, mainsnak: Mainsnak)

Bases: BaseModel

Model class a Claim.

mainsnak: Mainsnak
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class mex.common.wikidata.models.Claims(*, P856: list[Claim] = [], P213: list[Claim] = [], P6782: list[Claim] = [], P1448: list[Claim] = [], P1813: list[Claim] = [], P1705: list[Claim] = [], P4871: list[Claim] = [], P227: list[Claim] = [], P214: list[Claim] = [])

Bases: BaseModel

model class for Claims.

gepris_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P4871', alias_priority=2)]
gnd_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P227', alias_priority=2)]
isni_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P213', alias_priority=2)]
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

native_label: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P1705', alias_priority=2)]
official_name: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P1448', alias_priority=2)]
ror_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P6782', alias_priority=2)]
short_name: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P1813', alias_priority=2)]
viaf_id: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P214', alias_priority=2)]
website: Annotated[list[Claim], FieldInfo(annotation=NoneType, required=True, alias='P856', alias_priority=2)]
class mex.common.wikidata.models.DataValue(*, value: Value)

Bases: BaseModel

Model class for Data Values (for claims).

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod transform_strings_to_dict(values: dict[str, str | dict[str, str]]) dict[str, dict[str, str | None]] | dict[str, str | dict[str, str]]

Transform string and null value to a dict for parsing.

Parameters:

values – values that needs to be parsed

Returns:

resulting dict

value: Value
class mex.common.wikidata.models.Label(*, language: str | None = None, value: str)

Bases: BaseModel

Model class for single Label.

language: str | None
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: str
class mex.common.wikidata.models.Labels(*, de: Label | None = None, en: Label | None = None, mul: Label | None = None)

Bases: BaseModel

Model class for Labels.

de: Label | None
en: Label | None
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

multiple: Annotated[Label | None, FieldInfo(annotation=NoneType, required=True, alias='mul', alias_priority=2)]
class mex.common.wikidata.models.Mainsnak(*, datavalue: DataValue)

Bases: BaseModel

Model class for Mainsnack (for claims).

datavalue: DataValue
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class mex.common.wikidata.models.Value(*, text: str | None = None, language: str | None = None)

Bases: BaseModel

Model class for Values (for claims).

language: str | None
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None
class mex.common.wikidata.models.WikidataOrganization(*, id: str, labels: Labels, claims: Claims, aliases: Aliases)

Bases: BaseModel

Model class for Wikidata sources.

aliases: Aliases
claims: Claims
identifier: Annotated[str, FieldInfo(annotation=NoneType, required=True, alias='id', alias_priority=2)]
labels: Labels
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'str_max_length': 100000, 'str_min_length': 1, 'str_strip_whitespace': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mex.common.wikidata.transform module

mex.common.wikidata.transform._get_alternative_names(native_labels: Sequence[Claim], all_aliases: Aliases) list[Text]

Get alternative names of an organization in DE and EN.

Parameters:
  • native_labels – Sequence of all native labels

  • all_aliases – All aliases of the organization

Returns:

combined list of native labels and aliases in DE and EN

mex.common.wikidata.transform._get_clean_short_names(short_names: Sequence[Claim]) list[Text]

Get clean short names only in EN and DE and ignore the rest.

Parameters:

short_names – List of all short names

Returns:

list of clean short names in EN and DE

mex.common.wikidata.transform.get_official_name_label(labels: Labels) Text | None

Get if DE label is available and return a list of EN and DE labels.

Parameters:

labels – Wikidata labels object

Returns:

Text object of the label that was picked, or None

mex.common.wikidata.transform.transform_wikidata_organization_to_extracted_organization(wikidata_organization: WikidataOrganization, wikidata_primary_source_id: MergedPrimarySourceIdentifier) ExtractedOrganization | None

Transform one wikidata organization into ExtractedOrganizations.

If no labels are found on the wikidata organization, None is returned instead.

Parameters:
  • wikidata_organization – wikidata organization to be transformed

  • wikidata_primary_source_id – Extracted primary source id for wikidata

Returns:

ExtractedOrganization or None

mex.common.wikidata.transform.transform_wikidata_organizations_to_extracted_organizations(wikidata_organizations: Iterable[WikidataOrganization], wikidata_primary_source_id: MergedPrimarySourceIdentifier) Generator[ExtractedOrganization, None, None]

Transform wikidata organizations into ExtractedOrganizations.

Wikidata organizations without labels are skipped.

Parameters:
  • wikidata_organizations – Iterable of wikidata organization to be transformed

  • wikidata_primary_source_id – Extracted primary source id for wikidata

Returns:

Generator of ExtractedOrganizations

Module contents