Instructions to use z-dickson/CAP_multilingual with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use z-dickson/CAP_multilingual with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="z-dickson/CAP_multilingual")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("z-dickson/CAP_multilingual") model = AutoModelForSequenceClassification.from_pretrained("z-dickson/CAP_multilingual") - Notebooks
- Google Colab
- Kaggle
Multilingual Bert base (multilingual uncased) model trained to predict CAP issue codes from text documents such as speeches, press releases, social media messages, news articles, bills, laws etc..
Model training on 120,000 assorted political documents -- mostly from the Comparative Agendas Project
Countries:
- Italy
- Sweden
- France
- Switzerland
- Poland
- Netherlands
- Germany
- Denmark
- Spain
- UK
- Austria
- Ireland
LABELS USED IN TRAINING
Model labels -> CAP labels:
{0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0}
Model labels -> CAP issues:
{0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'}
Validation
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 | 0.72 | 0.83 | 0.77 | 211 |
| 1 | 0.82 | 0.77 | 0.79 | 242 |
| 2 | 0.82 | 0.86 | 0.84 | 251 |
| 3 | 0.92 | 0.89 | 0.90 | 228 |
| 4 | 0.81 | 0.85 | 0.83 | 220 |
| 5 | 0.90 | 0.93 | 0.91 | 244 |
| 6 | 0.87 | 0.87 | 0.87 | 230 |
| 7 | 0.92 | 0.88 | 0.90 | 251 |
| 8 | 0.94 | 0.90 | 0.92 | 237 |
| 9 | 0.87 | 0.88 | 0.87 | 263 |
| 10 | 0.70 | 0.88 | 0.78 | 189 |
| 11 | 0.90 | 0.81 | 0.85 | 248 |
| 12 | 0.87 | 0.90 | 0.88 | 222 |
| 13 | 0.76 | 0.72 | 0.74 | 255 |
| 14 | 0.84 | 0.84 | 0.84 | 241 |
| 15 | 0.92 | 0.79 | 0.85 | 276 |
| 16 | 0.95 | 0.90 | 0.92 | 258 |
| 17 | 0.71 | 0.82 | 0.76 | 200 |
| 18 | 0.77 | 0.73 | 0.75 | 215 |
| 19 | 0.92 | 0.91 | 0.92 | 239 |
| Accuracy | --- 0.85 --- | |||
| Macro Avg | 0.85 | 0.85 | 0.85 | 4720 |
| Weighted Avg | 0.85 | 0.85 | 0.85 | 4720 |
from transformers import AutoModelForSequenceClassification
from transformers import TextClassificationPipeline, AutoTokenizer
mp = 'z-dickson/CAP_multilingual'
model = AutoModelForSequenceClassification.from_pretrained(mp)
tokenizer = AutoTokenizer.from_pretrained(mp)
classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0)
classifier("""
To ask the Secretary of State for Energy and Climate \\
Change what estimate he has made of the proportion of carbon \\
dioxide emissions arising in the UK attributable to burning.
"""
)
- Downloads last month
- 78