Home Technical Overview Tanl Pipeline API

NER pipeline

Python source code for the pipeline used by the NER Service.


# Tanl pipeline
p1 = Splitta('spanish.splitta/').pipe([text])
p2 = Tokenizer().pipe(p1)
p3 = HmmTagger('spanish.hmm').pipe(p2)
p4 = MorphSplitter('es').pipe(p3)
p5 = NerLR('spanish.LR').pipe(p4)

# Collect entities
ret = ""
for s in p5:
   for t in s:
      cur = t['NETAG']
      form = t['FORM']
         if cur != 'O':
              res += "%s\t%s\n" % (form, cur)
return ret

Parser pipeline

Python source code for the pipeline used by the Parse Service.

# Create corpus
c = Corpus.create("es", "CoNLL")
    
# Tanl pipeline
p1 = SentenceSplitter('spanish.splitta/').pipe([text])
p2 = Tokenizer().pipe(p1)
p3 = HmmTagger('spanish.hmm').pipe(p2)
p4 = MorphSplitter('es').pipe(p3)
p5 = Parser.create('spanish.MLP').pipe(p4)
    
# Return parsed text
ret = ""
for s in p5:
  ret += c.toString(s) + "\n"
return ret