Live Service Demo
Query Wikipedia live pages.
Download and install JSONpedia: visit the Developers Site and the Documentation.
Convert a Resource
Result
Convert a WikiText
Result
Quick API reference
Examples
- Process page "Albert Einstein" from English Wikipedia and return result in JSON:
http://.../annotate/resource/json/en:Albert_Einstein
- Process page "Albert Einstein" from Italian Wikipedia and return result in HTML:
http://.../annotate/resource/html/it:Albert_Einstein
- Process page "CURL" and return result in JSON:
curl http://.../annotate/resource/json/en:CURL
- Process page "Test" specifying WikiText content and return result in JSON:
curl --data "wikitext=This is just a [[Test]]..." http://.../annotate/resource/json/en:Test
GET /annotate
API
HTTP GET requests can be sent to the URL:
http://...//annotate/<format>/<uri>
with the following parameters:
Parameter | Description | Required |
---|---|---|
uri | full URL (or short identifier) of a Wikipedia input resource, es: http://en.wikipedia.org/wiki/Albert_Einstein or en:Albert_Einstein | True |
format | Desired output format. See supported formats. | True |
filter | Filter criteria to be applied to the result. See filter syntax. | False |
procs | Processors to be activated on this request. See procs syntax. | False |
The response is the input resource converted to the desired output format. Processing errors are encoded in error codes table.
POST /annotate
API
Large WikiText documents can be also sent by HTTP POSTing form data to:
http://...//annotate/<format>/<uri>
The request Content-Type
must be set to application/x-www-form-urlencoded
.
The following parameters are supported:
Parameter | Description | Required |
---|---|---|
uri | full URL (or short identifier) of a Wikipedia input resource, es: http://en.wikipedia.org/wiki/Albert_Einstein or en:Albert_Einstein | True |
format | Desired output format. See supported formats. | True |
filter | Filter criteria to be applied to the result. See filter syntax. | False |
procs | Processors to be activated on this request. See procs syntax. | False |
wikitext | The wikitext to be converted, if not specified will be retrieved the one addressed by uri . |
False |
The response is the input resource converted to the desired output format. Processing errors are encoded in error codes table.
Output formats
Supported output format are:
Output format | Media type | Description |
---|---|---|
JSON | application/json |
Produces a JSON object which sections are described in processors section. |
HTMLBeta | application/xhtml |
Produces an XHTML markup showing the extracted data structure. |
Error codes
Processing errors are reported as HTTP status codes with short text/plain
messages.
The following status codes can be returned:
Code | Reason |
---|---|
200 OK | Success. |
400 Bad Request | Invalid parameters. |
403 Forbidden | Wikipedia not reachable. |
404 Not Found | Wiki resource not defined. |
500 Internal Error | Generic error occurred. |
Processors Documentation
Extractors
Extract specific parts of content from the WikiText document events stream.
SectionExtractor
Extracts the list of sections defined within the document.
"sections": [
{
"title": "Biography",
"level": 0
},
...
]
LinkExtractor
Extracts the list of links (to external resources) defined within the document.
"links": [
{
"link": "http://books.google.com/books?id=jJl2JAqvoSAC&pg=PA41",
"description": "Chapter 2, p. 41"
},
...
]
ReferenceExtractor
Extracts the list of links (to Wikipedia internal resources) defined within the document.
"references": [
{
"label": "http://en.wikipedia.org/wiki/Albert_EinsteinUlm",
"description": ""
},
...
]
TemplateOccurrencesExtractor
Extracts statistics on template occurrences into the document.
"templates": {
"@type": "templates",
"occurrences": {
"Birth date": 1,
"Block quote": 1,
"Portal": 1,
"Use dmy dates": 1,
"Link GA": 2,
"cite web": 12,
...
}
}
CategoriesExtractor
Extracts the list of all declared categories into the document.
"categories": {
"@type": "category",
"content": [
"1879 births",
"1955 deaths",
"20th-century American people",
"20th-century German people",
"20th-century Swiss people",
"American humanitarians",
"American inventors",
"American pacifists",
"American people of German-Jewish descent",
"Cosmologists",
...
]
}
OnlineExtractors
Online Extractors are specific Extractors which rely on external services, like DBpedia and Freebase.
TemplateMappingExtractor
Extracts the RDF ontology mapping defined in DBpedia.org for every detected template into the document.
"template-mapping": {
"@type": "mapping-collection",
"mapping-collection": [
{
"@type": "template-mapping",
"name": "TemplateMapping",
"mapping": {
"notable_students": "notableStudent",
"death_date": "deathDate",
"birth_date": "birthDate",
"birth_name": "birthName",
"citizenship": "citizenship",
...
},
"issues": null
}
]
}
FreebaseExtractor
Extracts the definition of the entity represented by the document from Freebase.
"freebase": { "alias": [ "Einstein", "albert_einstein", "Einstein, Albert" ], "article": { "id": "/m/0jd6" }, "guid": "#9202a8c04000641f800000000000417c", "id": "/en/albert_einstein", "image": { "id": "/wikipedia/images/commons_id/925243" }, "name": "Albert Einstein", "relevance:score": 178.9317169189453, "type": [ { "id": "/influence/influence_node", "name": "Influence Node" }, { "id": "/award/award_winner", "name": "Award Winner" }, { "id": "/business/board_member", "name": "Organization leader" }, ... ], ... }
Splitters
A Splitter cuts sub-trees of the WikiText DOM containing specific sections.
InfoboxSplitter
Splits out from the parsing DOM the Infobox section.
"infobox-splitter": [ { "@type": "template", "name": "Infobox scientist", "content": { "name": [ "Albert Einstein" ], "image": [ "Einstein 1921 portrait2.jpg" ], "caption": [ "Albert Einstein in 1921" ], "birth_date": [ { "@type": "template", "name": "Birth date", "content": { "3": null, "14": null, "1879": null, "df": [ "yes" ] } } ], "birth_place": [ { "@type": "reference", "label": "Ulm", "description": "" }, ... ], ... } } ]
TableSplitter
Splits out from the parsing DOM all the table sections.
"table-splitter": [ { "@type": "table", "content": [ "class=wikitable", ... { "@type": "head_cell", "content": [ "Area of focus" ] }, { "@type": "head_cell", "content": [ "Received" ] }, ... { "@type": "head_cell", "content": [ "Significance" ] }, { "@type": "body_cell", "content": [ "''On a Heuristic Viewpoint Concerning the Production and Transformation of Light''" ] }, { "@type": "body_cell", "content": [ { "@type": "reference", "label": "Photoelectric effect", "description": "" } ] }, ... ] } ]
Structure
Prints out the entire DOM tree representation.
Validate
Prints out all warnings and errors detected during the document parsing.
"issues": [ { "type": "Warning", "description": "Invalid char '<' within comment tag.", "row": 307, "col": 65 } ]
Filter Syntax
A JSONpedia filter is conceptually similar to a CSS selector.
It allows to specify a hierarchical selection criteria to address specific JSON nodes.
The most basic format is the key filter which matches for any object the value which key satisfies the given regexp:
sectionsA more complex format is a list of comma-separated key/value patterns like:
name:Death date and age,@type:templatewhere Key-i is a string matching a JSON object key and Value-i is a regexp matching the value associated to such key. Any JSON object matching ALL these patterns will be returned. If patters contain special characters like
,
they can be escaped within quotes:
url:".*[\s,\d]?\.html",@type:linkHierarchical patterns can be combined with the
>
operator like
(see live example on en:Albert_Einstein):
notable_students>@type:template,name:Plainlist>@type:referenceThe full filter syntax is reported below.
<filter> ::= <key-selector> | <object-selector> ; <key-selector> ::= <Valid-Java-Regexp> ; <object-selector> ::= <key-value-selector> | <key-value-selector> ',' <object-selector> ; <key-value-selector> ::= <key-matcher> ':' <value-matcher> ; <key-matcher> ::= <Valid-JSON-Key-Name> ; <value-matcher> ::= <Valid-Java-Regexp> | '"' <Valid-Java-Regexp> '"' ;
Processors Syntax
A list of comma-separated names of Processors that must be applied over the document processing. Some Processors are active by default, so they don't require to be explicitly activated. The default Processors can be forcibly disabled preprending a - symbol before the name. For example the following string:-Extractors,Linkersdisables the Extractors (otherwise active by default)and enables the Linkers.