Skip to main content

Guidelines for making annotations for NLP classification

Based on our experience working with numerous annotations and building NLP models based on them, following these guidelines helps to render the subsequent workflow more efficient and organised.

Format

  • One class per line or separate columns for keywords for class 1, for class 2, etc. :

  • If more than 1 class is annotated for a review, please copy the review into new lines for the second and subsequent classes. Alternatively, create columns class 1, keywords for class 1, class 2, keywords for class 2, etc. Otherwise, it’s hard to match the keywords to the right class for that review.

  • All in lower case.

Keyword selection

Relevant only for rule-based models.

  • Avoid too general keywords which could lead to wrong detection, for example: bon, j’adore, j’aime, pas mauvais, etc., which can refer to taste, quality, environment and many other aspects.

  • Avoid too specific expression as keywords which could lead to few detection, for example, the rule j'ai laissé quelques seconds (for class preparation) can potentially detect very few reviews.

  • In other words, try to select essential keywords as much as possible. For example, the essential part for the keywords ce tarif (for class prix) should be just tarif. For the keywords panure croustillante (for class gout) it should be just croustillante, as panure could link to gout, quality, texture, etc.