This package serves as basis for the paper "ORCAS-I: Queries Annotated with Intent using Weak Supervision"
DOI of the paper: https://doi.org/10.1145/3477495.3531737
Create conda environment:
$ conda create --name intents_labelling python==3.8.12Activate the environment:
$ source activate intents_labellingUse pip to install requirements:
(intents_labelling) $ pip install -r requirements.txtInstall intents_labelling package for development
(intents_labelling) $ pip install -e .Install spacy language model:
(intents_labelling) $ python -m spacy download en_core_web_lgList of movie titles can be found here.
Put all data files in data/input/ directory.
Create a training set which will be a sample of ORCAS dataset. Filter out testset examples
(intents_labelling) $ python intents_labelling/create_train_file.pyCreate snorkel annotations
(intents_labelling) $ python intents_labelling/main.pyTrain Bert model
(intents_labelling) $ python intents_labelling/models/train_bert_classifier.py