A ranking engine for text search. Given a query and a set of articles, it returns N most relevant articles.
The input required is a file ("file.txt") with the articles in json format:
{"abstract":"The article text here!!!", "keywords":["keyword1", "keyword2.. etc"], "title":"The title here"}
It may have more fields which will be ignored, also if one of the fields above is missing algorithms ignores it in computations.
- 
First run preprocess.py which does some preprocess to the articles and produces an output file.
 - 
Then run rank.py which takes as input the previous generated file and a query from standard input and using tf-idf it returns N most relevant articles. The ranking also gives diferrent weights to the article's abstract, keywords and title.
 
This project is licensed under the GPLv3 License - see the LICENSE file for details