mex.common.wikidata package

Subpackages

Submodules

mex.common.wikidata.connector module

class mex.common.wikidata.connector.WikidataAPIConnector

Bases: HTTPConnector

Connector class to handle requesting the Wikidata API.

_check_availability() None

Send a GET request to verify the host is available.

_set_url() None

Set url of the host.

get_wikidata_item_details_by_id(item_id: str) dict[str, str]

Get details of a wikidata item by item id.

Parameters:

item_id (str) – wikidata item id

Returns:

details of the found item.

Return type:

dict[str, Any]

class mex.common.wikidata.connector.WikidataQueryServiceConnector

Bases: HTTPConnector

Connector class to handle requesting the Wikidata Query Service.

TIMEOUT = 80
_check_availability() None

Send a GET request to verify the host is available.

_set_url() None

Set url of the host.

get_data_by_query(query: str) list[dict[str, dict[str, str]]]

Run provided query on wikidata using wikidata query service.

Parameters:

query (str) – Wikidata query

Returns:

list of all items found

Return type:

list

mex.common.wikidata.extract module

mex.common.wikidata.helpers module

mex.common.wikidata.transform module

Module contents

Helper extractor to search and extract organizations from Wikidata.

Wikidata Extractor require a call to wikidata to search the organization label which can take longer than usual as wikidata needs to search through an extensive database. That’s why the default Timeout for search request is 80 seconds.

In addition to extended timeout, wikidata sometimes start to block requests if too many requests are being sent, to avoid this there is a backoff system in place which might make the extraction process even slower.

Common use cases

  • extract info about an organization from wikidata using organization name

Configuration

For configuring wikidata extractor, wiki_api_url and wiki_query_service_url parameters in mex.common.settings needs to be set to Wikidata API URL (also referred to as MediaWiki API) https://www.wikidata.org/w/api.php and Wikidata Query Service URL https://query.wikidata.org/ respectively.

Extracting organization

Use search_organization_by_label in wikidata.extract by passing in the name of one organization. This function will first call wikidata query service to search for organization and then call wikidata api url on each one of them to fetch all info about the organization.

Transforming organization

Use the transform_wikidata_organizations_to_extracted_organizations in wikidata.transform to get MEx ExtractedOrganization from wikidata results.