mex.common.wikidata package¶
Subpackages¶
Submodules¶
mex.common.wikidata.connector module¶
- class mex.common.wikidata.connector.WikidataAPIConnector¶
Bases:
HTTPConnector
Connector class to handle requesting the Wikidata API.
- _check_availability() None ¶
Send a GET request to verify the host is available.
- _set_url() None ¶
Set url of the host.
- get_wikidata_item_details_by_id(item_id: str) dict[str, str] ¶
Get details of a wikidata item by item id.
- Parameters:
item_id (str) – wikidata item id
- Returns:
details of the found item.
- Return type:
dict[str, Any]
- class mex.common.wikidata.connector.WikidataQueryServiceConnector¶
Bases:
HTTPConnector
Connector class to handle requesting the Wikidata Query Service.
- TIMEOUT = 80¶
- _check_availability() None ¶
Send a GET request to verify the host is available.
- _set_url() None ¶
Set url of the host.
- get_data_by_query(query: str) list[dict[str, dict[str, str]]] ¶
Run provided query on wikidata using wikidata query service.
- Parameters:
query (str) – Wikidata query
- Returns:
list of all items found
- Return type:
list
mex.common.wikidata.extract module¶
mex.common.wikidata.helpers module¶
mex.common.wikidata.transform module¶
Module contents¶
Helper extractor to search and extract organizations from Wikidata.
Wikidata Extractor require a call to wikidata to search the organization label which can take longer than usual as wikidata needs to search through an extensive database. That’s why the default Timeout for search request is 80 seconds.
In addition to extended timeout, wikidata sometimes start to block requests if too many requests are being sent, to avoid this there is a backoff system in place which might make the extraction process even slower.
Common use cases¶
extract info about an organization from wikidata using organization name
Configuration¶
For configuring wikidata extractor, wiki_api_url and wiki_query_service_url parameters in mex.common.settings needs to be set to Wikidata API URL (also referred to as MediaWiki API) https://www.wikidata.org/w/api.php and Wikidata Query Service URL https://query.wikidata.org/ respectively.
Extracting organization¶
Use search_organization_by_label in wikidata.extract by passing in the name of one organization. This function will first call wikidata query service to search for organization and then call wikidata api url on each one of them to fetch all info about the organization.
Transforming organization¶
Use the transform_wikidata_organizations_to_extracted_organizations in wikidata.transform to get MEx ExtractedOrganization from wikidata results.