Query Language

The Wikipedia objects contain just one indexed column:

tokens
list of tokens with correpsponding tags, full text searchable

Queries can combine conditions on these columns according to the syntax described below.

Here are a few examples:

   tokens matches 'pyramid scheme'

   tokens matches 'hedge funds' fraud

   tokens matches diabetes control

   tokens matches proximity 20 [cancer aspirin] (prevent | prevention | evidence)

   tokens matches proximity 20 ((prostate) & (cancer) & (treatment therapy
 surgery))

The following rules in BNF-like notation specify the full query syntax:

condition:  match
        | NOT match
        | match relOp condition

match:ident matches pattern
        | ( condition )
        | comparison

pattern: attr patternOp pattern
        | attr pattern
        | attr

attr:	color = primary
        | primary
        | tagged

tagged:
	tag : *
	| tag : primary

tag:	tagType / tagValue

tagType: POS | MORPH | WSJ | IEER | DEP

color:	ident

ident:	WORD

primary: ( pattern )
        | ~ primary
        | WORD
        | WORDSTAR
        | PHRASE
        | proximity

proximity: proximity dist ( proxlist )
        | proximity dist [ proxphrases ]

dist: INT

proxlist: ( wordlist ) & proxlist

        | ( wordlist )

wordlist: WORD wordlist
        | WORD

proxphrases:
        proxphrases STRING
        | proxphrases WORD
        | STRING
        | WORD

relOp:  AND
        | OR
patternOp: &&
        | ||
comparison:  operand = operand
        | operand != operand
        | operand < operand
        | operand <= operand
        | operand > operand
        | operand >= operand
        | operand between operand and operan
d
        | operand not between operand and op
erand

operand: field
        | INT
        | FLOAT
        | STRING

field: ident