Overview

Dataset statistics

Number of variables17
Number of observations2987
Missing cells1493
Missing cells (%)2.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 MiB
Average record size in memory791.4 B

Variable types

Categorical5
Unsupported1
Boolean11

Alerts

religious_intolerance has constant value "False" Constant
id has a high cardinality: 2987 distinct values High cardinality
text has a high cardinality: 2987 distinct values High cardinality
lgbtqphobia is highly correlated with religious_intoleranceHigh correlation
is_targeted is highly correlated with insult and 3 other fieldsHigh correlation
insult is highly correlated with is_targeted and 2 other fieldsHigh correlation
xenophobia is highly correlated with religious_intoleranceHigh correlation
health is highly correlated with religious_intoleranceHigh correlation
other_lifestyle is highly correlated with religious_intoleranceHigh correlation
religious_intolerance is highly correlated with lgbtqphobia and 12 other fieldsHigh correlation
sexism is highly correlated with religious_intoleranceHigh correlation
racism is highly correlated with religious_intoleranceHigh correlation
physical_aspects is highly correlated with religious_intoleranceHigh correlation
targeted_type is highly correlated with is_targeted and 2 other fieldsHigh correlation
is_offensive is highly correlated with is_targeted and 3 other fieldsHigh correlation
profanity_obscene is highly correlated with religious_intoleranceHigh correlation
ideology is highly correlated with religious_intoleranceHigh correlation
is_offensive is highly correlated with is_targeted and 1 other fieldsHigh correlation
is_targeted is highly correlated with is_offensive and 1 other fieldsHigh correlation
health is highly correlated with physical_aspectsHigh correlation
insult is highly correlated with is_offensive and 1 other fieldsHigh correlation
physical_aspects is highly correlated with healthHigh correlation
targeted_type has 1136 (38.0%) missing values Missing
toxic_spans has 357 (12.0%) missing values Missing
id is uniformly distributed Uniform
text is uniformly distributed Uniform
id has unique values Unique
text has unique values Unique
toxic_spans is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-09-01 22:19:46.870158
Analysis finished2022-09-01 22:19:49.191505
Duration2.32 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct2987
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size259.7 KiB
460cca26c7144f9ea5edabe45d33ad7d
 
1
6c2bc42785174e608f50d633a450ac38
 
1
b4b6e8089ec04ba098d949484215d6fe
 
1
e4b3e1931eef473388c16d6e7923d664
 
1
8c3be0ca235445c3890dc48c74430ca0
 
1
Other values (2982)
2982 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters95584
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2987 ?
Unique (%)100.0%

Sample

1st row460cca26c7144f9ea5edabe45d33ad7d
2nd row542c3afa48114e4890b27b32f4b88c05
3rd row2d65e628400149438210298a6972b3a0
4th rowe1dfbeaae05145b2afc5544149d22f61
5th row614aeda1a791406a96671958a7a49aae

Common Values

ValueCountFrequency (%)
460cca26c7144f9ea5edabe45d33ad7d1
 
< 0.1%
6c2bc42785174e608f50d633a450ac381
 
< 0.1%
b4b6e8089ec04ba098d949484215d6fe1
 
< 0.1%
e4b3e1931eef473388c16d6e7923d6641
 
< 0.1%
8c3be0ca235445c3890dc48c74430ca01
 
< 0.1%
dbee3d03d628433b99f29de2d3f7edae1
 
< 0.1%
89f1188b93c54b0694b19a813b5f4e481
 
< 0.1%
3df756e3743345b7935b1da23978c00d1
 
< 0.1%
ee8f9a3c67dc4e568186444d04bd6a581
 
< 0.1%
a5d9d4b7951d456f8b4289b979f3ea4f1
 
< 0.1%
Other values (2977)2977
99.7%

Length

2022-09-01T19:19:49.272530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
460cca26c7144f9ea5edabe45d33ad7d1
 
< 0.1%
04882791c0bd42c888983a5e554be0361
 
< 0.1%
81ed0a68e6c84922885e438d12478f0b1
 
< 0.1%
161d8eaccf0442aaae0a13a454f1d0341
 
< 0.1%
ee2c038d4d7b40a28001c8b87dd0e8d01
 
< 0.1%
2d65e628400149438210298a6972b3a01
 
< 0.1%
e1dfbeaae05145b2afc5544149d22f611
 
< 0.1%
614aeda1a791406a96671958a7a49aae1
 
< 0.1%
0a53c08cb0c74b22b476638086c650d61
 
< 0.1%
6c3b5f35b2b54cafaac825ac4edeb7051
 
< 0.1%
Other values (2977)2977
99.7%

Most occurring characters

ValueCountFrequency (%)
48447
 
8.8%
b6426
 
6.7%
86421
 
6.7%
96307
 
6.6%
a6144
 
6.4%
05743
 
6.0%
65716
 
6.0%
15681
 
5.9%
75677
 
5.9%
e5633
 
5.9%
Other values (6)33389
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60620
63.4%
Lowercase Letter34964
36.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
48447
13.9%
86421
10.6%
96307
10.4%
05743
9.5%
65716
9.4%
15681
9.4%
75677
9.4%
55567
9.2%
25561
9.2%
35500
9.1%
Lowercase Letter
ValueCountFrequency (%)
b6426
18.4%
a6144
17.6%
e5633
16.1%
f5622
16.1%
d5597
16.0%
c5542
15.9%

Most occurring scripts

ValueCountFrequency (%)
Common60620
63.4%
Latin34964
36.6%

Most frequent character per script

Common
ValueCountFrequency (%)
48447
13.9%
86421
10.6%
96307
10.4%
05743
9.5%
65716
9.4%
15681
9.4%
75677
9.4%
55567
9.2%
25561
9.2%
35500
9.1%
Latin
ValueCountFrequency (%)
b6426
18.4%
a6144
17.6%
e5633
16.1%
f5622
16.1%
d5597
16.0%
c5542
15.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII95584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
48447
 
8.8%
b6426
 
6.7%
86421
 
6.7%
96307
 
6.6%
a6144
 
6.4%
05743
 
6.0%
65716
 
6.0%
15681
 
5.9%
75677
 
5.9%
e5633
 
5.9%
Other values (6)33389
34.9%

text
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct2987
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size995.9 KiB
"vc é minha vida bela" vc não tem vida seu peido do capeta USER kkkkkkkkkk rachei
 
1
USER jô? indoidou de vez? esse cara só pode tá louco , ou e mais um dos grandes clubistas q trabalham como jornalista esportivo....
 
1
USER USER preoUSERupão tanto d fika falando mal das modinhas USER a gente gosta USER nem USER preoUSERupão d proUSERurar no youtube mulheres USERantoras tentando fazer fama ai USER fika atraz d homens pra falar mal e USER USER USER tem mulheres no youtube tipo amannda USERondesUSERa USER homens ja perUSEReberão USER USER USER vem atraz d outros homens e nunUSERa d mulheres ow meu USER sera USER todos os homens vão vira gay desUSERulpa felipe mais USERresçe e vai atraz d alguma mulher pra fazer sexo pUSER tenho USERerteza homen USER fala muito d mulher virgem
 
1
Alguma bixona metida a hetero pode me dizer pq o brasil vai trocar de treinador antes da copa do mundo?
 
1
Tem coisa que não adianta debater. USER falar de um partido nazista é como dizer que tem coisa boa em um psicopata. Não pode defender o indefensável.
 
1
Other values (2982)
2982 

Length

Max length995
Median length454
Mean length119.9099431
Min length5

Characters and Unicode

Total characters358171
Distinct characters317
Distinct categories19 ?
Distinct scripts6 ?
Distinct blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2987 ?
Unique (%)100.0%

Sample

1st row"vc é minha vida bela" vc não tem vida seu peido do capeta USER kkkkkkkkkk rachei
2nd rowUSER Caralho mano eu ia mandar um vai tomar no cu mas ai lembrei do meme, por um momento meu sangue subiu e olha que nem sigo vc foi tt da Iza
3rd rowOs ignorantes nos comentários querendo dar desculpinhas pra comentar coisas preconceituosas. Que bom que as crianças assim tão novas já tem uma visão mais evoluída e empatica que vocês, bando de preconceituosos nojentos, que ao invés de estarem do lado do amor e de um mundo melhor estão do lado do preconceito e da dor, vocês só pensam assim porque não tem o mínimo de cultura e conhecimento, não sabem nem um terço da dor de um LGBTQ+, agora ao invés da gente ter que desenhar para as crianças entenderem temos que desenhar para os adultos entenderem, ENSINO USER IUSEROLOGIA USER GÊNERO NECESSÁRIO USER USER, POR UM MUNDO COM MAIS USER E MENOS USER! 💞
4th rowrafael é muito fdp, praga de garoto...
5th rowPUTARIA 🔞 URL

Common Values

ValueCountFrequency (%)
"vc é minha vida bela" vc não tem vida seu peido do capeta USER kkkkkkkkkk rachei1
 
< 0.1%
USER jô? indoidou de vez? esse cara só pode tá louco , ou e mais um dos grandes clubistas q trabalham como jornalista esportivo....1
 
< 0.1%
USER USER preoUSERupão tanto d fika falando mal das modinhas USER a gente gosta USER nem USER preoUSERupão d proUSERurar no youtube mulheres USERantoras tentando fazer fama ai USER fika atraz d homens pra falar mal e USER USER USER tem mulheres no youtube tipo amannda USERondesUSERa USER homens ja perUSEReberão USER USER USER vem atraz d outros homens e nunUSERa d mulheres ow meu USER sera USER todos os homens vão vira gay desUSERulpa felipe mais USERresçe e vai atraz d alguma mulher pra fazer sexo pUSER tenho USERerteza homen USER fala muito d mulher virgem1
 
< 0.1%
Alguma bixona metida a hetero pode me dizer pq o brasil vai trocar de treinador antes da copa do mundo?1
 
< 0.1%
Tem coisa que não adianta debater. USER falar de um partido nazista é como dizer que tem coisa boa em um psicopata. Não pode defender o indefensável.1
 
< 0.1%
Escapou de morrer, se ele morresse ela iria dizer que foi estuprada.1
 
< 0.1%
USER Esse governo maldito precisa pagar por seus crimes contra o povo brasileiro. HASHTAG1
 
< 0.1%
Não reclame d assédio,se vc sai c/ um pervetido sexual q conheceu há 2 min. e exije q ele te respeite, mesmo vc estando alcoolizada /drogada1
 
< 0.1%
que porra e crepuscolo ?e uma merda!1
 
< 0.1%
USER USER tenho msm mana sou conhecida no tt por maria putinha, vc queria oq? q eu passasse na dm dos outros falando a palavra de Deus?1
 
< 0.1%
Other values (2977)2977
99.7%

Length

2022-09-01T19:19:49.404512image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
user3922
 
6.0%
que1950
 
3.0%
de1802
 
2.7%
e1598
 
2.4%
o1555
 
2.4%
a1337
 
2.0%
é1136
 
1.7%
não934
 
1.4%
se670
 
1.0%
do634
 
1.0%
Other values (10633)50086
76.3%

Most occurring characters

ValueCountFrequency (%)
62637
17.5%
a31631
 
8.8%
e29340
 
8.2%
o25968
 
7.3%
s17702
 
4.9%
r16069
 
4.5%
i14470
 
4.0%
d12263
 
3.4%
n12222
 
3.4%
m11926
 
3.3%
Other values (307)123943
34.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter247272
69.0%
Space Separator62637
 
17.5%
Uppercase Letter36417
 
10.2%
Other Punctuation9498
 
2.7%
Other Symbol883
 
0.2%
Decimal Number734
 
0.2%
Dash Punctuation185
 
0.1%
Math Symbol113
 
< 0.1%
Open Punctuation106
 
< 0.1%
Close Punctuation99
 
< 0.1%
Other values (9)227
 
0.1%

Most frequent character per category

Other Symbol
ValueCountFrequency (%)
🤣143
 
16.2%
😂123
 
13.9%
🤮110
 
12.5%
🤬26
 
2.9%
😭21
 
2.4%
😡19
 
2.2%
19
 
2.2%
😈15
 
1.7%
💩14
 
1.6%
👏13
 
1.5%
Other values (134)380
43.0%
Lowercase Letter
ValueCountFrequency (%)
a31631
12.8%
e29340
11.9%
o25968
10.5%
s17702
 
7.2%
r16069
 
6.5%
i14470
 
5.9%
d12263
 
5.0%
n12222
 
4.9%
m11926
 
4.8%
u11289
 
4.6%
Other values (42)64392
26.0%
Uppercase Letter
ValueCountFrequency (%)
E6327
17.4%
R5832
16.0%
S5703
15.7%
U5437
14.9%
A2175
 
6.0%
O1589
 
4.4%
T931
 
2.6%
N809
 
2.2%
M796
 
2.2%
D789
 
2.2%
Other values (32)6029
16.6%
Other Punctuation
ValueCountFrequency (%)
.3538
37.2%
,2956
31.1%
!1533
16.1%
?619
 
6.5%
"274
 
2.9%
:253
 
2.7%
'85
 
0.9%
74
 
0.8%
*71
 
0.7%
/35
 
0.4%
Other values (9)60
 
0.6%
Other Letter
ValueCountFrequency (%)
8
18.2%
6
13.6%
6
13.6%
4
9.1%
2
 
4.5%
2
 
4.5%
2
 
4.5%
ا2
 
4.5%
2
 
4.5%
1
 
2.3%
Other values (9)9
20.5%
Decimal Number
ValueCountFrequency (%)
0160
21.8%
2134
18.3%
1131
17.8%
376
10.4%
443
 
5.9%
943
 
5.9%
640
 
5.4%
537
 
5.0%
837
 
5.0%
733
 
4.5%
Math Symbol
ValueCountFrequency (%)
>61
54.0%
+21
 
18.6%
¬17
 
15.0%
=10
 
8.8%
<2
 
1.8%
~1
 
0.9%
|1
 
0.9%
Modifier Symbol
ValueCountFrequency (%)
🏻22
44.9%
🏼8
 
16.3%
🏾7
 
14.3%
^6
 
12.2%
🏽3
 
6.1%
🏿2
 
4.1%
´1
 
2.0%
Open Punctuation
ValueCountFrequency (%)
(96
90.6%
[9
 
8.5%
1
 
0.9%
Close Punctuation
ValueCountFrequency (%)
)88
88.9%
]10
 
10.1%
1
 
1.0%
Dash Punctuation
ValueCountFrequency (%)
-181
97.8%
4
 
2.2%
Final Punctuation
ValueCountFrequency (%)
26
86.7%
4
 
13.3%
Space Separator
ValueCountFrequency (%)
62637
100.0%
Nonspacing Mark
ValueCountFrequency (%)
39
100.0%
Initial Punctuation
ValueCountFrequency (%)
33
100.0%
Format
ValueCountFrequency (%)
14
100.0%
Connector Punctuation
ValueCountFrequency (%)
_13
100.0%
Currency Symbol
ValueCountFrequency (%)
$3
100.0%
Other Number
ValueCountFrequency (%)
¹2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin283681
79.2%
Common74394
 
20.8%
Inherited53
 
< 0.1%
Han28
 
< 0.1%
Arabic9
 
< 0.1%
Hangul6
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
62637
84.2%
.3538
 
4.8%
,2956
 
4.0%
!1533
 
2.1%
?619
 
0.8%
"274
 
0.4%
:253
 
0.3%
-181
 
0.2%
0160
 
0.2%
🤣143
 
0.2%
Other values (199)2100
 
2.8%
Latin
ValueCountFrequency (%)
a31631
 
11.2%
e29340
 
10.3%
o25968
 
9.2%
s17702
 
6.2%
r16069
 
5.7%
i14470
 
5.1%
d12263
 
4.3%
n12222
 
4.3%
m11926
 
4.2%
u11289
 
4.0%
Other values (78)100801
35.5%
Han
ValueCountFrequency (%)
8
28.6%
6
21.4%
4
14.3%
2
 
7.1%
2
 
7.1%
2
 
7.1%
2
 
7.1%
1
 
3.6%
1
 
3.6%
Arabic
ValueCountFrequency (%)
ا2
22.2%
م1
11.1%
ه1
11.1%
و1
11.1%
ل1
11.1%
ع1
11.1%
ن1
11.1%
ص1
11.1%
Inherited
ValueCountFrequency (%)
39
73.6%
14
 
26.4%
Hangul
ValueCountFrequency (%)
6
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII349272
97.5%
None8266
 
2.3%
Emoticons319
 
0.1%
Punctuation158
 
< 0.1%
VS39
 
< 0.1%
Dingbats34
 
< 0.1%
CJK28
 
< 0.1%
Misc Symbols25
 
< 0.1%
Arabic9
 
< 0.1%
Math Alphanum9
 
< 0.1%
Other values (2)12
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
62637
17.9%
a31631
 
9.1%
e29340
 
8.4%
o25968
 
7.4%
s17702
 
5.1%
r16069
 
4.6%
i14470
 
4.1%
d12263
 
3.5%
n12222
 
3.5%
m11926
 
3.4%
Other values (82)115044
32.9%
None
ValueCountFrequency (%)
ã1953
23.6%
é1631
19.7%
á912
11.0%
ç763
 
9.2%
í626
 
7.6%
ó582
 
7.0%
ê384
 
4.6%
ú197
 
2.4%
É146
 
1.8%
🤣143
 
1.7%
Other values (118)929
11.2%
Emoticons
ValueCountFrequency (%)
😂123
38.6%
😭21
 
6.6%
😡19
 
6.0%
😈15
 
4.7%
😒12
 
3.8%
😍11
 
3.4%
😅8
 
2.5%
😠7
 
2.2%
😹7
 
2.2%
🙏7
 
2.2%
Other values (33)89
27.9%
Punctuation
ValueCountFrequency (%)
74
46.8%
33
20.9%
26
 
16.5%
14
 
8.9%
4
 
2.5%
4
 
2.5%
3
 
1.9%
VS
ValueCountFrequency (%)
39
100.0%
Dingbats
ValueCountFrequency (%)
19
55.9%
5
 
14.7%
3
 
8.8%
2
 
5.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
Misc Symbols
ValueCountFrequency (%)
8
32.0%
5
20.0%
5
20.0%
3
 
12.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
CJK
ValueCountFrequency (%)
8
28.6%
6
21.4%
4
14.3%
2
 
7.1%
2
 
7.1%
2
 
7.1%
2
 
7.1%
1
 
3.6%
1
 
3.6%
Compat Jamo
ValueCountFrequency (%)
6
100.0%
Arabic
ValueCountFrequency (%)
ا2
22.2%
م1
11.1%
ه1
11.1%
و1
11.1%
ل1
11.1%
ع1
11.1%
ن1
11.1%
ص1
11.1%
Math Alphanum
ValueCountFrequency (%)
𝕡2
22.2%
𝕒2
22.2%
𝑹1
11.1%
𝕟1
11.1%
𝕖1
11.1%
𝕆1
11.1%
𝕜1
11.1%
Enclosed Alphanum Sup
ValueCountFrequency (%)
🇧2
33.3%
🇷2
33.3%
🇨1
16.7%
🇳1
16.7%

is_offensive
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size175.1 KiB
OFF
2630 
NOT
357 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8961
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOFF
2nd rowOFF
3rd rowOFF
4th rowOFF
5th rowOFF

Common Values

ValueCountFrequency (%)
OFF2630
88.0%
NOT357
 
12.0%

Length

2022-09-01T19:19:49.537517image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T19:19:49.649521image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
off2630
88.0%
not357
 
12.0%

Most occurring characters

ValueCountFrequency (%)
F5260
58.7%
O2987
33.3%
N357
 
4.0%
T357
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8961
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F5260
58.7%
O2987
33.3%
N357
 
4.0%
T357
 
4.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8961
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F5260
58.7%
O2987
33.3%
N357
 
4.0%
T357
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8961
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F5260
58.7%
O2987
33.3%
N357
 
4.0%
T357
 
4.0%

is_targeted
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size175.1 KiB
TIN
1942 
UNT
1045 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8961
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTIN
2nd rowTIN
3rd rowTIN
4th rowTIN
5th rowTIN

Common Values

ValueCountFrequency (%)
TIN1942
65.0%
UNT1045
35.0%

Length

2022-09-01T19:19:49.743525image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T19:19:49.847528image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
tin1942
65.0%
unt1045
35.0%

Most occurring characters

ValueCountFrequency (%)
T2987
33.3%
N2987
33.3%
I1942
21.7%
U1045
 
11.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8961
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T2987
33.3%
N2987
33.3%
I1942
21.7%
U1045
 
11.7%

Most occurring scripts

ValueCountFrequency (%)
Latin8961
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T2987
33.3%
N2987
33.3%
I1942
21.7%
U1045
 
11.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII8961
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T2987
33.3%
N2987
33.3%
I1942
21.7%
U1045
 
11.7%

targeted_type
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.2%
Missing1136
Missing (%)38.0%
Memory size144.1 KiB
IND
1175 
GRP
375 
OTH
301 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5553
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIND
2nd rowIND
3rd rowGRP
4th rowIND
5th rowOTH

Common Values

ValueCountFrequency (%)
IND1175
39.3%
GRP375
 
12.6%
OTH301
 
10.1%
(Missing)1136
38.0%

Length

2022-09-01T19:19:49.934532image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T19:19:50.071676image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ind1175
63.5%
grp375
 
20.3%
oth301
 
16.3%

Most occurring characters

ValueCountFrequency (%)
I1175
21.2%
N1175
21.2%
D1175
21.2%
G375
 
6.8%
R375
 
6.8%
P375
 
6.8%
O301
 
5.4%
T301
 
5.4%
H301
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5553
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I1175
21.2%
N1175
21.2%
D1175
21.2%
G375
 
6.8%
R375
 
6.8%
P375
 
6.8%
O301
 
5.4%
T301
 
5.4%
H301
 
5.4%

Most occurring scripts

ValueCountFrequency (%)
Latin5553
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I1175
21.2%
N1175
21.2%
D1175
21.2%
G375
 
6.8%
R375
 
6.8%
P375
 
6.8%
O301
 
5.4%
T301
 
5.4%
H301
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII5553
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I1175
21.2%
N1175
21.2%
D1175
21.2%
G375
 
6.8%
R375
 
6.8%
P375
 
6.8%
O301
 
5.4%
T301
 
5.4%
H301
 
5.4%

toxic_spans
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing357
Missing (%)12.0%
Memory size538.6 KiB

health
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2900 
True
 
87
ValueCountFrequency (%)
False2900
97.1%
True87
 
2.9%
2022-09-01T19:19:50.179380image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

ideology
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2520 
True
467 
ValueCountFrequency (%)
False2520
84.4%
True467
 
15.6%
2022-09-01T19:19:50.281382image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

insult
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
True
2405 
False
582 
ValueCountFrequency (%)
True2405
80.5%
False582
 
19.5%
2022-09-01T19:19:50.402390image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

lgbtqphobia
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2828 
True
 
159
ValueCountFrequency (%)
False2828
94.7%
True159
 
5.3%
2022-09-01T19:19:50.500388image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

other_lifestyle
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2911 
True
 
76
ValueCountFrequency (%)
False2911
97.5%
True76
 
2.5%
2022-09-01T19:19:50.662394image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

physical_aspects
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2841 
True
 
146
ValueCountFrequency (%)
False2841
95.1%
True146
 
4.9%
2022-09-01T19:19:50.894402image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

profanity_obscene
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
1837 
True
1150 
ValueCountFrequency (%)
False1837
61.5%
True1150
38.5%
2022-09-01T19:19:51.062407image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

racism
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2961 
True
 
26
ValueCountFrequency (%)
False2961
99.1%
True26
 
0.9%
2022-09-01T19:19:51.240418image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

religious_intolerance
Boolean

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2987 
ValueCountFrequency (%)
False2987
100.0%
2022-09-01T19:19:51.359414image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

sexism
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2860 
True
 
127
ValueCountFrequency (%)
False2860
95.7%
True127
 
4.3%
2022-09-01T19:19:51.477151image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

xenophobia
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.0 KiB
False
2944 
True
 
43
ValueCountFrequency (%)
False2944
98.6%
True43
 
1.4%
2022-09-01T19:19:51.597156image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-01T19:19:51.700389image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-01T19:19:51.954957image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-01T19:19:52.178636image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-01T19:19:52.370642image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-01T19:19:52.586674image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-01T19:19:48.518965image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-01T19:19:48.805187image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-01T19:19:48.980390image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-01T19:19:49.072390image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idtextis_offensiveis_targetedtargeted_typetoxic_spanshealthideologyinsultlgbtqphobiaother_lifestylephysical_aspectsprofanity_obsceneracismreligious_intolerancesexismxenophobia
0460cca26c7144f9ea5edabe45d33ad7d"vc é minha vida bela" vc não tem vida seu peido do capeta USER kkkkkkkkkk racheiOFFTININD[42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
1542c3afa48114e4890b27b32f4b88c05USER Caralho mano eu ia mandar um vai tomar no cu mas ai lembrei do meme, por um momento meu sangue subiu e olha que nem sigo vc foi tt da IzaOFFTININD[4, 5, 6, 7, 8, 9, 10, 11, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
22d65e628400149438210298a6972b3a0Os ignorantes nos comentários querendo dar desculpinhas pra comentar coisas preconceituosas. Que bom que as crianças assim tão novas já tem uma visão mais evoluída e empatica que vocês, bando de preconceituosos nojentos, que ao invés de estarem do lado do amor e de um mundo melhor estão do lado do preconceito e da dor, vocês só pensam assim porque não tem o mínimo de cultura e conhecimento, não sabem nem um terço da dor de um LGBTQ+, agora ao invés da gente ter que desenhar para as crianças entenderem temos que desenhar para os adultos entenderem, ENSINO USER IUSEROLOGIA USER GÊNERO NECESSÁRIO USER USER, POR UM MUNDO COM MAIS USER E MENOS USER! 💞OFFTINGRP[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 211, 212, 213, 214, 215, 216, 217, 218]FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
3e1dfbeaae05145b2afc5544149d22f61rafael é muito fdp, praga de garoto...OFFTININD[15, 16, 17, 20, 21, 22, 23, 24, 25]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
4614aeda1a791406a96671958a7a49aaePUTARIA 🔞 URLOFFTINOTH[0, 1, 2, 3, 4, 5, 6, 7]FalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalse
50a53c08cb0c74b22b476638086c650d6Um ignorante vs um arrogante.OFFTININD[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
66c3b5f35b2b54cafaac825ac4edeb705Pelos comentários aqui, da pra ver o porque o USER está entre os países que mais matam homossexuais no mundo.NOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
74c54dd18cc1e493bb970e5cfd5f8adb7É INTERESSANTE USER ESSA IMPRESSA SE USER PARA ENTREGAR USER ENTREVISTA LIXO PARA A SOCIEDADE.OFFTINOTH[73, 74, 75, 76, 77, 81, 82, 83, 84]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
804882791c0bd42c888983a5e554be036Vão tomar vergonha na cara Burguer USER ,deixem nossas crianças em paz!!!!!OFFTINOTH[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
9b85d10840170438ebe3be8bac9362decUSER Achei q sua mãe ia comer seu cuOFFTININD[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse

Last rows

idtextis_offensiveis_targetedtargeted_typetoxic_spanshealthideologyinsultlgbtqphobiaother_lifestylephysical_aspectsprofanity_obsceneracismreligious_intolerancesexismxenophobia
29770ffe522af137469aa7e9f38f51512e23Tem q tira esse cao dessa casaNOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2978b8924d8d5bec4e6594ec1d00e8871d2bUSER VOTO USER USER USERNOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2979610ca49ff8004ca198d3c9d4ab5e9e9cUSER USER USER USER obrigado por responderem de forma coerente, do contrário desses moleques que se acham donos da verdade aqui nos comentários. Ainda sim, acho que o fim da carreira do USER foi uma tempestade no copo d'água porque ele falou merda e se retratou por isso. Que não vem muito ao caso, mas ele estava bêbado. Tais medidas seriam justas caso estivesse abertamente apoiando tal atrocidade, o que não foi o caso. O USER foi um pioneiro em podcast no Brasil, e empregou muita gente, particularmente acredito que todo esse cancelamento foi exagero.OFFTINNone[84, 85, 86, 87, 88, 89, 90, 91, 242, 243, 244, 245, 246, 247]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
2980e95d599466d14b35a9e5852e2bf1a9cb'trabalhando graças ao feminismo' RH decide: - vou escolher a mulher pq mesmo ela nao concluindo a faculdade, ela cagou na foto do bolsonaroOFFTINGRP[113, 114, 115, 116, 117, 118, 119]FalseTrueTrueFalseFalseFalseFalseFalseFalseTrueFalse
2981e3d6694a5bbb4cb0905dbc4af924a6bcEngraçado ver nego bater no peito e falar que vai fazer isso e aquilo, que não precisa de patrocínio pra nada, e agora enfia o rabo no meio das pernas por conta da opinião da pessoa...... não pode falar do partido alemão que assassinou milhões de pessoas, mas vamos falar livremente da Igreja e esquece tudo o que já foi feito em nome da Igreja...OFFTINOTH[119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]FalseTrueTrueFalseFalseFalseTrueFalseFalseFalseFalse
298297864419e3e54fb5b53557fe5c44389fmulher insuportável...OFFTININD[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
298368ab3c88a1134fe4ad9d7ee85abb8df7por que caralhos você precisa ir pro chile ou para a argentina se depois vai para frança? você é burro?faça letras (pt/fr) na usp, viva com todos os auxílios que a universidade dá e aproveite as bolsas de intercâmbio que abrem no início de cada semestre (se você tiver uma boa média ponderada consegue inclusive ser pago para fazer o intercâmbio).daí pra frente toca seu plano.OFFTININD[8, 9, 10, 11, 12, 13, 14, 15, 16, 96, 97, 98, 99, 100, 101]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
2984c1b41e051fd040f287a1d56192f8759cPobre diabo.kkkkkk URLNOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
29854b86de06a731413ebcbb9389f7ff5e8fele ficou mais puto por saber q tava geral cantando junto, aqui é nós n é a genteOFFTININD[14, 15, 16, 17, 18, 19]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
2986b90d33a84ee447f794e39981f21ecd41USER Que absurdo. Isso não pode passar. Segue adiante Leonel. Que Deus proteja sua vida. Que Deus proteja a vida do Lula e de todos os ameaçados. Que essa gente HIPÓCRITA e DESUMANA sejam condenados. Deus provê. Deus proverá. A sua misericórdia não faltará. 🙏🏻✝️🛐♥️😢✊🏻🚩OFFUNTNone[161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 173, 174, 175, 176, 177, 178, 179, 180]FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse