WIP. Additional information and code will be added soon.
Automatically Scraped Hard News Event Extraction dataset.
The dataset contains
| Event Type | #Documents | Event Type | #Documents |
|---|---|---|---|
| Air crash | 55 | Mass Poisoning | 7 |
| Armed Conflict | 76 | Military Exercise | 70 |
| Bank Robbery | 7 | Mine Collapses | 4 |
| Disease Outbreaks | 59 | Mudslides | 21 |
| Droughts | 18 | Other | 1229 |
| Earthquakes | 56 | Protest_Online Condemnation | 68 |
| Environment Pollution | 39 | Regime Change | 2 |
| Famine | 12 | Riot | 16 |
| Financial Crisis | 27 | Road Crash | 86 |
| Fire | 77 | Shipwreck | 37 |
| Floods | 84 | Strike | 65 |
| Gas explosion | 23 | Train collisions | 6 |
| Hurricanes_Tornado_Storm_Blizzard | 98 | Tsunamis | 0 |
| Insect Disaster | 24 | Volcano Eruption | 13 |
For majority of articles you can find the url in the ashnee_url.csv file.
Articles were mainly scraped from the following portals/domains: dailymail.co.uk, thewest.com.au, bbc.com, *allafrica.com, thetimes.co.uk, nzherald.co.nz, indiatimes.com, sputniknews.com, indepedent.co.uk, 9news.com.au, inquirer.net, theguardian.com, mb.com.ph, punchng.com, thestar.com.my, sott.net, and news.com.au.
Most articles were published between 2019. and 2022.
List of models we fine-tuned for event detection: roberta-base, roberta-large, deberta-v3-base, deberta-large, distilroberta-base, and albert-base-v2.
List of models we fine-tuned for argument extraction: roberta-base, roberta-large, deberta-v3-base, deberta-v3-large, distilroberta-base, and albert-base-v2.