KAISTUsers & Information Lab

[Dataset Name]: Korean Hate Speech Dataset

* Please note that all comments on this page are only data and are not intended to be offensive in any way.

[Dataset Name] is Korean hate speech dataset and is created based on comments from Naver News and Youtube. Each comment is divided into binary whether it is offensive or non-offensive and if the comment is offensive, it checks if there is a target of offensive language. The dataset contains information about offensivse-span and target-span.


Data Example

Examples of data contained in this dataset are shown below.

Title:
젠더와 컨버터! 이제 더이상 헷갈리지 마세요!

Comment:
dvi to hdmi 젠더는 어디에서 구할 수 있나요? 구하기가 매우 어렵네요.

Question:
Is this comment offensive?
If so, who is the target?
Classification:
Not offensive comment

Title:
집중취재 젠더 갈등 중심 제천여성도서관, 할머니 뜻은? MBC충북NEWS

Comment:
이럴 줄 알았다 어디서 쌩 구라를 쳐

Question:
Is this comment offensive?
If so, who is the target?
Classification:
Offensive comment

Target:
Not specified

Offensive span:
이럴 줄 알았다 어디서 쌩 구라를 쳐 (all sentence)
Title:
유해석 박사의 이슬람 특강 - 영국의 브렉시트와 이슬람에 대한 관계?

Comment:
성리학이 꽃을 피웠던 나라가, 이제는 기독교 오염으로...?

Question:
Is this comment offensive?
If so, who is the target?
Classification:
Offensive comment

Target:
[Group] Religion - Christian

Offensive span:
성리학이 꽃을 피웠던 나라가, 이제는 기독교 오염으로...?
Title:
주호영vs이준석, ‘강남역’ 메시지…당대표 젠더 이슈 ‘전초전’

Comment:
늙은이들은 젠더이슈에 대해서 그냥 입닥치고 있어ㅋㅋ 지들이 다 망쳐놓고선

Question:
Is this comment offensive?
If so, who is the target?
Classification:
Offensive comment

Target:
[Group] Others - Age

Offensive span:
늙은이들은 젠더이슈에 대해서 그냥 입닥치고 있어ㅋㅋ 지들이 다 망쳐놓고선

Although the title is written on the card, the title is not considered when actually labeling. Unlike the first non-aggressive example, the rest of the comments are all offensive. The comments do not contain offensive words, but each sentence is offensive.

Aggressive sentences are classified according to the type of target. The aggression of the 2nd sentence does not have a separate target, but the aggression of the 3rd and 4rd is directed towards a specific group. All aggressive and targeted comments in the dataset contain information about this target group.


Hate-span recognition

Mode:
Input Text