Overview
The Tanl Italian Pipeline is a Web service for:
- extracting named entities from Italian texts;
- parsing Italian texts and producing parse trees according to the Tanl Dependency Notation.
- split text into sentences;
- tokenize sentences;
- extract lemma, Part-of-Speech and morphology for each token;
- extract named entities;
- build the dependency trees.
The tools are connected to each other forming a
Tanl Data Pipeline.
The pipeline uses the Tanl NER and the DeSR parser.
The pipeline architecture was developed as part of the SemaWiki project.
You may look at the Python source code for this pipeline.
References
- G. Attardi, Experiments with a Multilanguage Non-Projective Dependency Parser. Proc. of the Tenth Conference on Natural Language Learning, New York, (NY), 2006.
- G. Attardi, S. Dei Rossi, F. Dell'Orletta, E.M. Vecchi. The Tanl Named Entity Recognizer at Evalita 2009. Proc. of Workshop Evalita 2009, ISBN 978-88-903581-1-1, 2009.
- G. Attardi, F. Dell'Orletta, M. Simi, J. Turian. Accurate Dependency Parsing with a Stacked Multilayer Perceptron. Proc. of Workshop Evalita 2009, ISBN 978-88-903581-1-1, 2009.
- G. Attardi, S. Dei Rossi, M. Simi. The Tanl Pipeline. Proc. of LREC Workshop on WSPP, Malta, 2010.
- G. Attardi, S. Dei Rossi, M. Simi. The Tanl Lemmatizer Enriched with a Sequence of Cascading Filters. In B. Magnini et al. (Eds.), Proc. of Evalita 2011, LNCS 7689, pp. 257-265, 2013. ISBN 978-3-642-35827-2.
- G. Attardi, G. Berardi, S. Dei Rossi, M. Simi. The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011. In B. Magnini et al. (Eds.), Proc. of Evalita 2011, LNCS 7689, pp. 116-125, 2013. ISBN 978-3-642-35827-2.
- G. Attardi, L. Baronti, S. Dei Rossi, M. Simi. SuperSense Tagging with a Maximum Entropy Markov Model. In B. Magnini et al. (Eds.), Proc. of Evalita 2011, LNCS 7689, pp. 186-194, 2013.