Live Service Demo

Query Wikipedia live pages.

Convert any MediaWiki document to JSON!

Download and install JSONpedia: visit the Developers Site and the Documentation.


Convert a Resource

GET /annotate/resource

Enter a MediaWiki resource ID or URI, an output format, a set of processors to be applied and optionally a filter.

Result



Convert a WikiText

POST /annotate/resource

Enter a MediaWiki resource ID or URI, a WikiText content, an output format, a set of processors to be applied and optionally a filter.

Result



Quick API reference

Examples

GET /annotate API

HTTP GET requests can be sent to the URL:

http://...//annotate/<format>/<uri>
with the following parameters:
Parameter Description Required
uri full URL (or short identifier) of a Wikipedia input resource, es: http://en.wikipedia.org/wiki/Albert_Einstein or en:Albert_Einstein True
format Desired output format. See supported formats. True
filter Filter criteria to be applied to the result. See filter syntax. False
procs Processors to be activated on this request. See procs syntax. False

The response is the input resource converted to the desired output format. Processing errors are encoded in error codes table.

POST /annotate API

Large WikiText documents can be also sent by HTTP POSTing form data to:

http://...//annotate/<format>/<uri>

The request Content-Type must be set to application/x-www-form-urlencoded.
The following parameters are supported:

Parameter Description Required
uri full URL (or short identifier) of a Wikipedia input resource, es: http://en.wikipedia.org/wiki/Albert_Einstein or en:Albert_Einstein True
format Desired output format. See supported formats. True
filter Filter criteria to be applied to the result. See filter syntax. False
procs Processors to be activated on this request. See procs syntax. False
wikitext The wikitext to be converted, if not specified will be retrieved the one addressed by uri. False

The response is the input resource converted to the desired output format. Processing errors are encoded in error codes table.

Output formats

Supported output format are:

Output format Media type Description
JSON application/json Produces a JSON object which sections are described in processors section.
HTMLBeta application/xhtml Produces an XHTML markup showing the extracted data structure.

Error codes

Processing errors are reported as HTTP status codes with short text/plain messages.
The following status codes can be returned:

Code Reason
200 OK Success.
400 Bad Request Invalid parameters.
403 Forbidden Wikipedia not reachable.
404 Not Found Wiki resource not defined.
500 Internal Error Generic error occurred.

Processors Documentation

Extractors

Extract specific parts of content from the WikiText document events stream.

SectionExtractor

Extracts the list of sections defined within the document.

"sections": [
        {
            "title": "Biography",
            "level": 0
        },
        ...
]

LinkExtractor

Extracts the list of links (to external resources) defined within the document.

"links": [
    {
        "link": "http://books.google.com/books?id=jJl2JAqvoSAC&pg=PA41",
        "description": "Chapter 2, p. 41"
    },
    ...
]

ReferenceExtractor

Extracts the list of links (to Wikipedia internal resources) defined within the document.

"references": [
    {
        "label": "http://en.wikipedia.org/wiki/Albert_EinsteinUlm",
        "description": ""
    },
    ...
]

TemplateOccurrencesExtractor

Extracts statistics on template occurrences into the document.

"templates": {
    "@type": "templates",
    "occurrences": {
        "Birth date": 1,
        "Block quote": 1,
        "Portal": 1,
        "Use dmy dates": 1,
        "Link GA": 2,
        "cite web": 12,
        ...
    }
}

CategoriesExtractor

Extracts the list of all declared categories into the document.

"categories": {
    "@type": "category",
    "content": [
        "1879 births",
        "1955 deaths",
        "20th-century American people",
        "20th-century German people",
        "20th-century Swiss people",
        "American humanitarians",
        "American inventors",
        "American pacifists",
        "American people of German-Jewish descent",
        "Cosmologists",
        ...
    ]
}

OnlineExtractors

Online Extractors are specific Extractors which rely on external services, like DBpedia and Freebase.

TemplateMappingExtractor

Extracts the RDF ontology mapping defined in DBpedia.org for every detected template into the document.

"template-mapping": {
     "@type": "mapping-collection",
     "mapping-collection": [
         {
             "@type": "template-mapping",
             "name": "TemplateMapping",
             "mapping": {
                 "notable_students": "notableStudent",
                 "death_date": "deathDate",
                 "birth_date": "birthDate",
                 "birth_name": "birthName",
                 "citizenship": "citizenship",
                 ...
             },
             "issues": null
         }
     ]
 }

FreebaseExtractor

Extracts the definition of the entity represented by the document from Freebase.

"freebase": {
    "alias": [
        "Einstein",
        "albert_einstein",
        "Einstein, Albert"
    ],
    "article": {
        "id": "/m/0jd6"
    },
    "guid": "#9202a8c04000641f800000000000417c",
    "id": "/en/albert_einstein",
    "image": {
        "id": "/wikipedia/images/commons_id/925243"
    },
    "name": "Albert Einstein",
    "relevance:score": 178.9317169189453,
    "type": [
        {
            "id": "/influence/influence_node",
            "name": "Influence Node"
        },
        {
            "id": "/award/award_winner",
            "name": "Award Winner"
        },
        {
            "id": "/business/board_member",
            "name": "Organization leader"
        },
        ...
    ],
    ...
}

Splitters

A Splitter cuts sub-trees of the WikiText DOM containing specific sections.

InfoboxSplitter

Splits out from the parsing DOM the Infobox section.

"infobox-splitter": [
    {
        "@type": "template",
        "name": "Infobox scientist",
        "content": {
            "name": [
                "Albert Einstein"
            ],
            "image": [
                "Einstein 1921 portrait2.jpg"
            ],
            "caption": [
                "Albert Einstein in 1921"
            ],
            "birth_date": [
                {
                    "@type": "template",
                    "name": "Birth date",
                    "content": {
                        "3": null,
                        "14": null,
                        "1879": null,
                        "df": [
                            "yes"
                        ]
                    }
                }
            ],
            "birth_place": [
                {
                    "@type": "reference",
                    "label": "Ulm",
                    "description": ""
                },
                ...
            ],
            ...
        }
    }
]

TableSplitter

Splits out from the parsing DOM all the table sections.

"table-splitter": [
     {
         "@type": "table",
         "content": [
             "class=wikitable",
             ...
             {
                 "@type": "head_cell",
                 "content": [
                     "Area of focus"
                 ]
             },
             {
                 "@type": "head_cell",
                 "content": [
                     "Received"
                 ]
             },
             ...
             {
                 "@type": "head_cell",
                 "content": [
                     "Significance"
                 ]
             },
             {
                 "@type": "body_cell",
                 "content": [
                     "''On a Heuristic Viewpoint Concerning the Production and Transformation of Light''"
                 ]
             },
             {
                 "@type": "body_cell",
                 "content": [
                     {
                         "@type": "reference",
                         "label": "Photoelectric effect",
                         "description": ""
                     }
                 ]
             },
             ...
         ]
    }
]

Structure

Prints out the entire DOM tree representation.

Validate

Prints out all warnings and errors detected during the document parsing.

"issues": [
    {
        "type": "Warning",
        "description": "Invalid char '<' within comment tag.",
        "row": 307,
        "col": 65
    }
]

Filter Syntax

A JSONpedia filter is conceptually similar to a CSS selector. It allows to specify a hierarchical selection criteria to address specific JSON nodes.
The most basic format is the key filter which matches for any object the value which key satisfies the given regexp:

sections
A more complex format is a list of comma-separated key/value patterns like:
name:Death date and age,@type:template
where Key-i is a string matching a JSON object key and Value-i is a regexp matching the value associated to such key. Any JSON object matching ALL these patterns will be returned. If patters contain special characters like , they can be escaped within quotes:
url:".*[\s,\d]?\.html",@type:link
Hierarchical patterns can be combined with the > operator like (see live example on en:Albert_Einstein):
notable_students>@type:template,name:Plainlist>@type:reference
The full filter syntax is reported below.
<filter>             ::= <key-selector> | <object-selector> ;
<key-selector>       ::= <Valid-Java-Regexp> ;
<object-selector>    ::= <key-value-selector> | <key-value-selector> ',' <object-selector> ;
<key-value-selector> ::= <key-matcher> ':' <value-matcher> ;
<key-matcher>        ::= <Valid-JSON-Key-Name> ;
<value-matcher>      ::= <Valid-Java-Regexp> | '"' <Valid-Java-Regexp> '"' ;

Processors Syntax

A list of comma-separated names of Processors that must be applied over the document processing. Some Processors are active by default, so they don't require to be explicitly activated. The default Processors can be forcibly disabled preprending a - symbol before the name. For example the following string:
-Extractors,Linkers
disables the Extractors (otherwise active by default)and enables the Linkers.