[Dataset Name]: Korean Hate Speech Dataset
* Please note that all comments on this page are only data and are not intended to be offensive in any way.
[Dataset Name] is Korean hate speech dataset and is created based on comments from Naver News and Youtube. Each comment is divided into binary whether it is offensive or non-offensive and if the comment is offensive, it checks if there is a target of offensive language. The dataset contains information about offensivse-span and target-span.
Data Example
Examples of data contained in this dataset are shown below.
Although the title is written on the card, the title is not considered when actually labeling. Unlike the first non-aggressive example, the rest of the comments are all offensive. The comments do not contain offensive words, but each sentence is offensive.
Aggressive sentences are classified according to the type of target. The aggression of the 2nd sentence does not have a separate target, but the aggression of the 3rd and 4rd is directed towards a specific group. All aggressive and targeted comments in the dataset contain information about this target group.