Overview

Dataset statistics

Number of variables17
Number of observations2996
Missing cells1400
Missing cells (%)2.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.7 MiB
Average record size in memory939.2 B

Variable types

Categorical5
Unsupported1
Boolean11

Alerts

religious_intolerance has constant value "False" Constant
id has a high cardinality: 2996 distinct values High cardinality
text has a high cardinality: 2996 distinct values High cardinality
is_offensive is highly correlated with targeted_type and 2 other fieldsHigh correlation
health is highly correlated with religious_intoleranceHigh correlation
other_lifestyle is highly correlated with religious_intoleranceHigh correlation
racism is highly correlated with religious_intoleranceHigh correlation
sexism is highly correlated with religious_intoleranceHigh correlation
is_targeted is highly correlated with targeted_type and 1 other fieldsHigh correlation
targeted_type is highly correlated with is_offensive and 2 other fieldsHigh correlation
profanity_obscene is highly correlated with religious_intoleranceHigh correlation
insult is highly correlated with is_offensive and 1 other fieldsHigh correlation
religious_intolerance is highly correlated with is_offensive and 12 other fieldsHigh correlation
xenophobia is highly correlated with religious_intoleranceHigh correlation
physical_aspects is highly correlated with religious_intoleranceHigh correlation
lgbtqphobia is highly correlated with religious_intoleranceHigh correlation
ideology is highly correlated with religious_intoleranceHigh correlation
is_offensive is highly correlated with insultHigh correlation
insult is highly correlated with is_offensiveHigh correlation
targeted_type has 1283 (42.8%) missing values Missing
toxic_spans has 117 (3.9%) missing values Missing
id is uniformly distributed Uniform
text is uniformly distributed Uniform
id has unique values Unique
text has unique values Unique
toxic_spans is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-09-01 22:12:40.318763
Analysis finished2022-09-01 22:12:53.568420
Duration13.25 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct2996
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size260.5 KiB
da19df36730945f08df3d09efa354876
 
1
49bf28b765484b9f963d6885eb48df31
 
1
ee5a5d40987f424ab71dd60abf56a23c
 
1
3ca64c7132d042feaa0eadea0e76ff22
 
1
e1bffd1c19be401393ab91e196839854
 
1
Other values (2991)
2991 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters95872
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2996 ?
Unique (%)100.0%

Sample

1st rowda19df36730945f08df3d09efa354876
2nd row80f1a8c981864887b13963fed1261acc
3rd row80eee9db811c4ea4b2ddb7863d12c5fe
4th row2f67025f913e4a6292e3d000d9e2b5a8
5th rowcd92f539559e421ba61cf23ecd005511

Common Values

ValueCountFrequency (%)
da19df36730945f08df3d09efa3548761
 
< 0.1%
49bf28b765484b9f963d6885eb48df311
 
< 0.1%
ee5a5d40987f424ab71dd60abf56a23c1
 
< 0.1%
3ca64c7132d042feaa0eadea0e76ff221
 
< 0.1%
e1bffd1c19be401393ab91e1968398541
 
< 0.1%
4dab80b5088d4255a0f2372704246e4e1
 
< 0.1%
b564daa958ec4b7390c4674c298e72a41
 
< 0.1%
10978af756c94a479fbaf54a144c70521
 
< 0.1%
1dbe54562a824d42bd95a728c5d6afef1
 
< 0.1%
1a1b7b30ec08435889c578791b8188c31
 
< 0.1%
Other values (2986)2986
99.7%

Length

2022-09-01T19:12:53.748434image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
da19df36730945f08df3d09efa3548761
 
< 0.1%
cc66b54eeec24607a67e2259134a1cdd1
 
< 0.1%
a223536974394b15b5e3bb658ebc596b1
 
< 0.1%
396d5049c82c430a9f4fa5694c00cbd21
 
< 0.1%
80eee9db811c4ea4b2ddb7863d12c5fe1
 
< 0.1%
2f67025f913e4a6292e3d000d9e2b5a81
 
< 0.1%
cd92f539559e421ba61cf23ecd0055111
 
< 0.1%
430b13705cf34e13b74bc999425187c31
 
< 0.1%
c779826dc43f460cb18e8429ca4434771
 
< 0.1%
e64148caa4474fc79298e01d0dda8f5e1
 
< 0.1%
Other values (2986)2986
99.7%

Most occurring characters

ValueCountFrequency (%)
48805
 
9.2%
96413
 
6.7%
86392
 
6.7%
b6381
 
6.7%
a6270
 
6.5%
25760
 
6.0%
75663
 
5.9%
05635
 
5.9%
e5599
 
5.8%
15590
 
5.8%
Other values (6)33364
34.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60956
63.6%
Lowercase Letter34916
36.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
48805
14.4%
96413
10.5%
86392
10.5%
25760
9.4%
75663
9.3%
05635
9.2%
15590
9.2%
35590
9.2%
65587
9.2%
55521
9.1%
Lowercase Letter
ValueCountFrequency (%)
b6381
18.3%
a6270
18.0%
e5599
16.0%
f5583
16.0%
d5573
16.0%
c5510
15.8%

Most occurring scripts

ValueCountFrequency (%)
Common60956
63.6%
Latin34916
36.4%

Most frequent character per script

Common
ValueCountFrequency (%)
48805
14.4%
96413
10.5%
86392
10.5%
25760
9.4%
75663
9.3%
05635
9.2%
15590
9.2%
35590
9.2%
65587
9.2%
55521
9.1%
Latin
ValueCountFrequency (%)
b6381
18.3%
a6270
18.0%
e5599
16.0%
f5583
16.0%
d5573
16.0%
c5510
15.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII95872
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
48805
 
9.2%
96413
 
6.7%
86392
 
6.7%
b6381
 
6.7%
a6270
 
6.5%
25760
 
6.0%
75663
 
5.9%
05635
 
5.9%
e5599
 
5.8%
15590
 
5.8%
Other values (6)33364
34.8%

text
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct2996
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs
 
1
6 DE PDL É PUTARIA!!! Chega por hoje URL
 
1
Perto dela ele fica calado
 
1
O que esperar de uma empresa que vende lixo para as pessoas comerem ?
 
1
USER USER é uma hamburgueria não uma escola, pai que precisa de uma hamburgueria pra educar filho é um pai vagabundo ou não é pai.
 
1
Other values (2991)
2991 

Length

Max length954
Median length476.5
Mean length125.6048064
Min length4

Characters and Unicode

Total characters376312
Distinct characters318
Distinct categories21 ?
Distinct scripts6 ?
Distinct blocks18 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2996 ?
Unique (%)100.0%

Sample

1st rowUSER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs
2nd rowCara isso foi muito babaca geral USER conhece o Monark e as merda que ele fala. Isso é muito merda eu USER E to decepcionada pra caralho mas não cabe a mim dizer
3rd rowQuem liga pra judeu kkkk
4th rowSe vc for porco, folgado e relaxado, você não ia conseguir viver com ela mesmo. Realmente, gente escrota não ia conseguir conviver com a Jojo
5th rowRapaziada chata, né?! O cara trabalha c funk, vive no meio de mulher, solteiro ou não ele ia gravar o clip.... mas aí porque ficou solteiro “tá querendo se mostrar” pqp, gente chata p caralho

Common Values

ValueCountFrequency (%)
USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs1
 
< 0.1%
6 DE PDL É PUTARIA!!! Chega por hoje URL1
 
< 0.1%
Perto dela ele fica calado1
 
< 0.1%
O que esperar de uma empresa que vende lixo para as pessoas comerem ?1
 
< 0.1%
USER USER é uma hamburgueria não uma escola, pai que precisa de uma hamburgueria pra educar filho é um pai vagabundo ou não é pai.1
 
< 0.1%
Que merda de insônia mos, quero dormir inferno1
 
< 0.1%
cara, dedo quente nesse cu gelado foi ótimo vei, ri pa caralho!!!!1
 
< 0.1%
caralho mano pq eu fui ler essa porra, vou jogar ácido no meu olho da próxima vez que eu inventar de fazer essa palhaçada1
 
< 0.1%
USER KKKKKKKKKK VAGABUNDA1
 
< 0.1%
USER USER USER percebeu a merda q USER falou? Liberdade de expressão não é tirar a liberdade de outra pessoa só porque USER quer. Pensa antes de escrever.1
 
< 0.1%
Other values (2986)2986
99.7%

Length

2022-09-01T19:12:53.872432image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
user3945
 
5.7%
que2076
 
3.0%
de1853
 
2.7%
e1675
 
2.4%
o1668
 
2.4%
a1429
 
2.1%
é1329
 
1.9%
não972
 
1.4%
um714
 
1.0%
do668
 
1.0%
Other values (10786)52615
76.3%

Most occurring characters

ValueCountFrequency (%)
65948
17.5%
a33194
 
8.8%
e31584
 
8.4%
o27632
 
7.3%
s19623
 
5.2%
r17519
 
4.7%
i15455
 
4.1%
n12862
 
3.4%
d12852
 
3.4%
m12482
 
3.3%
Other values (308)127161
33.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter263150
69.9%
Space Separator65948
 
17.5%
Uppercase Letter34911
 
9.3%
Other Punctuation9976
 
2.7%
Other Symbol961
 
0.3%
Decimal Number718
 
0.2%
Dash Punctuation113
 
< 0.1%
Close Punctuation96
 
< 0.1%
Open Punctuation88
 
< 0.1%
Math Symbol82
 
< 0.1%
Other values (11)269
 
0.1%

Most frequent character per category

Other Symbol
ValueCountFrequency (%)
😂141
 
14.7%
🤣110
 
11.4%
👏37
 
3.9%
😭35
 
3.6%
🤮27
 
2.8%
🤦24
 
2.5%
🇧20
 
2.1%
🇷20
 
2.1%
😍17
 
1.8%
😡17
 
1.8%
Other values (144)513
53.4%
Lowercase Letter
ValueCountFrequency (%)
a33194
12.6%
e31584
12.0%
o27632
10.5%
s19623
 
7.5%
r17519
 
6.7%
i15455
 
5.9%
n12862
 
4.9%
d12852
 
4.9%
m12482
 
4.7%
u11849
 
4.5%
Other values (33)68098
25.9%
Uppercase Letter
ValueCountFrequency (%)
E6049
17.3%
S5441
15.6%
R5406
15.5%
U5170
14.8%
A2061
 
5.9%
O1477
 
4.2%
T895
 
2.6%
N829
 
2.4%
M795
 
2.3%
D773
 
2.2%
Other values (28)6015
17.2%
Other Letter
ValueCountFrequency (%)
5
12.8%
4
 
10.3%
4
 
10.3%
4
 
10.3%
3
 
7.7%
º2
 
5.1%
2
 
5.1%
2
 
5.1%
1
 
2.6%
1
 
2.6%
Other values (11)11
28.2%
Other Punctuation
ValueCountFrequency (%)
.3696
37.0%
,3054
30.6%
!1555
15.6%
?661
 
6.6%
"406
 
4.1%
:252
 
2.5%
'97
 
1.0%
84
 
0.8%
*68
 
0.7%
/44
 
0.4%
Other values (8)59
 
0.6%
Decimal Number
ValueCountFrequency (%)
0153
21.3%
2111
15.5%
1110
15.3%
388
12.3%
465
9.1%
649
 
6.8%
547
 
6.5%
834
 
4.7%
931
 
4.3%
730
 
4.2%
Math Symbol
ValueCountFrequency (%)
>47
57.3%
=13
 
15.9%
+13
 
15.9%
<3
 
3.7%
~3
 
3.7%
¬2
 
2.4%
|1
 
1.2%
Modifier Symbol
ValueCountFrequency (%)
🏼23
31.1%
🏻14
18.9%
🏽13
17.6%
🏾9
 
12.2%
🏿8
 
10.8%
^5
 
6.8%
´2
 
2.7%
Dash Punctuation
ValueCountFrequency (%)
-108
95.6%
5
 
4.4%
Close Punctuation
ValueCountFrequency (%)
)90
93.8%
]6
 
6.2%
Open Punctuation
ValueCountFrequency (%)
(81
92.0%
[7
 
8.0%
Nonspacing Mark
ValueCountFrequency (%)
47
97.9%
͜1
 
2.1%
Format
ValueCountFrequency (%)
30
96.8%
­1
 
3.2%
Final Punctuation
ValueCountFrequency (%)
27
93.1%
2
 
6.9%
Connector Punctuation
ValueCountFrequency (%)
_7
87.5%
1
 
12.5%
Space Separator
ValueCountFrequency (%)
65948
100.0%
Initial Punctuation
ValueCountFrequency (%)
31
100.0%
Currency Symbol
ValueCountFrequency (%)
$4
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Enclosing Mark
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin298064
79.2%
Common78132
 
20.8%
Inherited80
 
< 0.1%
Han31
 
< 0.1%
Katakana4
 
< 0.1%
Hiragana1
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
65948
84.4%
.3696
 
4.7%
,3054
 
3.9%
!1555
 
2.0%
?661
 
0.8%
"406
 
0.5%
:252
 
0.3%
0153
 
0.2%
😂141
 
0.2%
2111
 
0.1%
Other values (202)2155
 
2.8%
Latin
ValueCountFrequency (%)
a33194
 
11.1%
e31584
 
10.6%
o27632
 
9.3%
s19623
 
6.6%
r17519
 
5.9%
i15455
 
5.2%
n12862
 
4.3%
d12852
 
4.3%
m12482
 
4.2%
u11849
 
4.0%
Other values (73)103012
34.6%
Han
ValueCountFrequency (%)
5
16.1%
4
12.9%
4
12.9%
4
12.9%
3
9.7%
2
 
6.5%
2
 
6.5%
1
 
3.2%
1
 
3.2%
1
 
3.2%
Other values (4)4
12.9%
Inherited
ValueCountFrequency (%)
47
58.8%
30
37.5%
2
 
2.5%
͜1
 
1.2%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Hiragana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII366689
97.4%
None8876
 
2.4%
Emoticons352
 
0.1%
Punctuation179
 
< 0.1%
VS47
 
< 0.1%
Enclosed Alphanum Sup40
 
< 0.1%
Misc Symbols38
 
< 0.1%
Dingbats33
 
< 0.1%
CJK31
 
< 0.1%
Geometric Shapes Ext12
 
< 0.1%
Other values (8)15
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
65948
18.0%
a33194
 
9.1%
e31584
 
8.6%
o27632
 
7.5%
s19623
 
5.4%
r17519
 
4.8%
i15455
 
4.2%
n12862
 
3.5%
d12852
 
3.5%
m12482
 
3.4%
Other values (82)117538
32.1%
None
ValueCountFrequency (%)
ã1978
22.3%
é1799
20.3%
á1017
11.5%
ç848
9.6%
ó684
 
7.7%
í675
 
7.6%
ê440
 
5.0%
ú204
 
2.3%
É155
 
1.7%
🤣110
 
1.2%
Other values (117)966
10.9%
Emoticons
ValueCountFrequency (%)
😂141
40.1%
😭35
 
9.9%
😍17
 
4.8%
😡17
 
4.8%
😠14
 
4.0%
😒13
 
3.7%
😆11
 
3.1%
😤9
 
2.6%
😢8
 
2.3%
🙄6
 
1.7%
Other values (35)81
23.0%
Punctuation
ValueCountFrequency (%)
84
46.9%
31
 
17.3%
30
 
16.8%
27
 
15.1%
5
 
2.8%
2
 
1.1%
VS
ValueCountFrequency (%)
47
100.0%
Enclosed Alphanum Sup
ValueCountFrequency (%)
🇧20
50.0%
🇷20
50.0%
Misc Symbols
ValueCountFrequency (%)
14
36.8%
12
31.6%
4
 
10.5%
2
 
5.3%
2
 
5.3%
2
 
5.3%
1
 
2.6%
1
 
2.6%
Dingbats
ValueCountFrequency (%)
14
42.4%
8
24.2%
5
 
15.2%
2
 
6.1%
1
 
3.0%
1
 
3.0%
1
 
3.0%
1
 
3.0%
Geometric Shapes Ext
ValueCountFrequency (%)
🟩10
83.3%
🟨2
 
16.7%
CJK
ValueCountFrequency (%)
5
16.1%
4
12.9%
4
12.9%
4
12.9%
3
9.7%
2
 
6.5%
2
 
6.5%
1
 
3.2%
1
 
3.2%
1
 
3.2%
Other values (4)4
12.9%
Box Drawing
ValueCountFrequency (%)
2
100.0%
Specials
ValueCountFrequency (%)
2
100.0%
Geometric Shapes
ValueCountFrequency (%)
1
50.0%
1
50.0%
Katakana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Hiragana
ValueCountFrequency (%)
1
100.0%
IPA Ext
ValueCountFrequency (%)
ʖ1
100.0%
Diacriticals
ValueCountFrequency (%)
͜1
100.0%
CJK Compat Forms
ValueCountFrequency (%)
1
100.0%

is_offensive
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size175.7 KiB
OFF
2879 
NOT
 
117

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8988
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOFF
2nd rowOFF
3rd rowOFF
4th rowOFF
5th rowOFF

Common Values

ValueCountFrequency (%)
OFF2879
96.1%
NOT117
 
3.9%

Length

2022-09-01T19:12:53.997422image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T19:12:54.109972image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
off2879
96.1%
not117
 
3.9%

Most occurring characters

ValueCountFrequency (%)
F5758
64.1%
O2996
33.3%
N117
 
1.3%
T117
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8988
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F5758
64.1%
O2996
33.3%
N117
 
1.3%
T117
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Latin8988
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F5758
64.1%
O2996
33.3%
N117
 
1.3%
T117
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8988
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F5758
64.1%
O2996
33.3%
N117
 
1.3%
T117
 
1.3%

is_targeted
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size175.7 KiB
TIN
1713 
UNT
1283 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters8988
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUNT
2nd rowTIN
3rd rowUNT
4th rowUNT
5th rowTIN

Common Values

ValueCountFrequency (%)
TIN1713
57.2%
UNT1283
42.8%

Length

2022-09-01T19:12:54.196967image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T19:12:54.300969image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
tin1713
57.2%
unt1283
42.8%

Most occurring characters

ValueCountFrequency (%)
T2996
33.3%
N2996
33.3%
I1713
19.1%
U1283
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8988
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T2996
33.3%
N2996
33.3%
I1713
19.1%
U1283
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin8988
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T2996
33.3%
N2996
33.3%
I1713
19.1%
U1283
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8988
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T2996
33.3%
N2996
33.3%
I1713
19.1%
U1283
14.3%

targeted_type
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.2%
Missing1283
Missing (%)42.8%
Memory size140.6 KiB
IND
1047 
GRP
574 
OTH
 
92

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5139
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGRP
2nd rowGRP
3rd rowGRP
4th rowGRP
5th rowOTH

Common Values

ValueCountFrequency (%)
IND1047
34.9%
GRP574
19.2%
OTH92
 
3.1%
(Missing)1283
42.8%

Length

2022-09-01T19:12:54.395985image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-01T19:12:54.527268image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ind1047
61.1%
grp574
33.5%
oth92
 
5.4%

Most occurring characters

ValueCountFrequency (%)
I1047
20.4%
N1047
20.4%
D1047
20.4%
G574
11.2%
R574
11.2%
P574
11.2%
O92
 
1.8%
T92
 
1.8%
H92
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5139
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I1047
20.4%
N1047
20.4%
D1047
20.4%
G574
11.2%
R574
11.2%
P574
11.2%
O92
 
1.8%
T92
 
1.8%
H92
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Latin5139
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I1047
20.4%
N1047
20.4%
D1047
20.4%
G574
11.2%
R574
11.2%
P574
11.2%
O92
 
1.8%
T92
 
1.8%
H92
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII5139
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I1047
20.4%
N1047
20.4%
D1047
20.4%
G574
11.2%
R574
11.2%
P574
11.2%
O92
 
1.8%
T92
 
1.8%
H92
 
1.8%

toxic_spans
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing117
Missing (%)3.9%
Memory size929.3 KiB

health
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2925 
True
 
71
ValueCountFrequency (%)
False2925
97.6%
True71
 
2.4%
2022-09-01T19:12:54.615271image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

ideology
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2220 
True
776 
ValueCountFrequency (%)
False2220
74.1%
True776
 
25.9%
2022-09-01T19:12:54.699274image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

insult
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
True
2822 
False
 
174
ValueCountFrequency (%)
True2822
94.2%
False174
 
5.8%
2022-09-01T19:12:54.783279image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

lgbtqphobia
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2783 
True
 
213
ValueCountFrequency (%)
False2783
92.9%
True213
 
7.1%
2022-09-01T19:12:54.859282image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

other_lifestyle
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2955 
True
 
41
ValueCountFrequency (%)
False2955
98.6%
True41
 
1.4%
2022-09-01T19:12:54.953283image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

physical_aspects
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2818 
True
 
178
ValueCountFrequency (%)
False2818
94.1%
True178
 
5.9%
2022-09-01T19:12:55.037288image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

profanity_obscene
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2064 
True
932 
ValueCountFrequency (%)
False2064
68.9%
True932
31.1%
2022-09-01T19:12:55.120289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

racism
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2919 
True
 
77
ValueCountFrequency (%)
False2919
97.4%
True77
 
2.6%
2022-09-01T19:12:55.203292image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

religious_intolerance
Boolean

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2996 
ValueCountFrequency (%)
False2996
100.0%
2022-09-01T19:12:55.301297image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

sexism
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2616 
True
380 
ValueCountFrequency (%)
False2616
87.3%
True380
 
12.7%
2022-09-01T19:12:55.386300image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

xenophobia
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
False
2898 
True
 
98
ValueCountFrequency (%)
False2898
96.7%
True98
 
3.3%
2022-09-01T19:12:55.887320image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-01T19:12:55.962321image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-01T19:12:56.136331image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-01T19:12:56.305960image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-01T19:12:56.485852image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-01T19:12:56.673388image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-01T19:12:52.830390image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-01T19:12:53.137408image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-01T19:12:53.368413image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-01T19:12:53.462415image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idtextis_offensiveis_targetedtargeted_typetoxic_spanshealthideologyinsultlgbtqphobiaother_lifestylephysical_aspectsprofanity_obsceneracismreligious_intolerancesexismxenophobia
0da19df36730945f08df3d09efa354876USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! RsrsrsOFFUNTNone[52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
180f1a8c981864887b13963fed1261accCara isso foi muito babaca geral USER conhece o Monark e as merda que ele fala. Isso é muito merda eu USER E to decepcionada pra caralho mas não cabe a mim dizerOFFTINGRP[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
280eee9db811c4ea4b2ddb7863d12c5feQuem liga pra judeu kkkkOFFUNTNone[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseTrue
32f67025f913e4a6292e3d000d9e2b5a8Se vc for porco, folgado e relaxado, você não ia conseguir viver com ela mesmo. Realmente, gente escrota não ia conseguir conviver com a JojoOFFUNTNone[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
4cd92f539559e421ba61cf23ecd005511Rapaziada chata, né?! O cara trabalha c funk, vive no meio de mulher, solteiro ou não ele ia gravar o clip.... mas aí porque ficou solteiro “tá querendo se mostrar” pqp, gente chata p caralhoOFFTINGRP[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 165, 166, 167, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
5430b13705cf34e13b74bc999425187c3USER USER é muito bom. USER ^^ E claro a equipe USER.NOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
6c779826dc43f460cb18e8429ca443477Pior do que adolescentezinhas de merda...são pessoas que levam filmes tão a sério! O livro/filme é dela, ela faz o que quiser! E por mais ruim que seja, ta rendendo milhões (:OFFUNTNone[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]FalseFalseTrueFalseFalseFalseTrueFalseFalseTrueFalse
7e64148caa4474fc79298e01d0dda8f5eUSER Toma no cu é vitamina como tu e tua prima.OFFTINGRP[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
8738ccd4476784f47af3a5a6cfdda4695Se fosse um sniper ia ser loucoOFFUNTNone[26, 27, 28, 29, 30]FalseFalseTrueFalseFalseTrueFalseFalseFalseFalseFalse
9cc66b54eeec24607a67e2259134a1cddMuito bom, pena a circunstâncias serem ruins, ele poderia ter vindo antes, até para dar uma aula ao Monark e fazer ele parar de falar tanta besteira...OFFUNTNone[119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse

Last rows

idtextis_offensiveis_targetedtargeted_typetoxic_spanshealthideologyinsultlgbtqphobiaother_lifestylephysical_aspectsprofanity_obsceneracismreligious_intolerancesexismxenophobia
2986358dd843965f429f9550f5a5eb41a46bRT USER: TODO MUNDO NESSE CARALHO DE REDE SOCIAL TEM UM SQUAD UMA PANELINHA UM GRUPINHO E OS CARALHO A4 L, SÓ EU QUE NÃO QUE ODIO QUE PO…OFFTINGRP[26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
29871d8a7915a93847d9abc67d2564926920Mano o povo gosta que seja ignorante , por isso eu mando pra casa do caralho mesmo 🤬OFFTINGRP[27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]FalseTrueTrueFalseFalseFalseTrueFalseFalseFalseTrue
29887545fcbcd9f5497987639797d2b19518Essa mulher é louca.kkkkkkkkk.OFFTININD[14, 15, 16, 17, 18]FalseFalseTrueFalseFalseFalseFalseFalseFalseTrueFalse
29898849dc11cdde4bcebf2b361e905811fdUSER Ladrão desgracado dos infernos. Voce é um câncer que faz mal ao pais. URLOFFTININD[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 27, 28, 29, 30, 31, 32, 33, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54]TrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
299078f5241468a54171893914852b0c6cceNós vamos colocar pra fuder cm esses bandidos do STF USER e senadoOFFTINGRP[22, 23, 24, 25, 26, 27, 37, 38, 39, 40, 41, 42, 43, 44, 45]FalseTrueTrueFalseFalseFalseTrueFalseFalseFalseFalse
29910315aa7b60dd48be9c5f47028422af13USER USER EU SO BURRA PERDÃO VO APAGA NGM AVISA NADA NESSE CARAIO TBMOFFTININD[16, 17, 18, 19, 20, 21, 59, 60, 61, 62, 63, 64, 65]FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
299257be31ef8fd54e96afa8482f08df8987RT USER: POLÍTICOS CORRUTOS DEVERIAM APODRECER NA CADEIA. POR CAUSA DELES MORREM MILHARES DE PESSOAS POR FALTA DE RECURSOS. ELES…OFFTINGRP[22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2993dfbffbb11afa4d28a745282c422314d4USER é sem personalidade, fala tudo ensaiado… parece um robô.OFFTININD[46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
2994d0e31661ac2e402b8d139b5efcd5c07cFICO ME PERGUNTANDO Q MERDA A PESSOA TEM NA CABEÇA PRA VOLTAR PRA UM RELACIONAMENTO FUDIDO? PQP EU FICO INDIGNADA COM O TANTO Q EXISTE MINA TROUXA QUERIA Q TODAS ENTENDESSEM A MERDA DE RELACIONAMENTO Q TÃO E A PESSOA LIXO DO LADO DELASOFFTINGRP[22, 23, 24, 25, 26, 27, 84, 85, 86, 87, 88, 89, 92, 93, 94, 140, 141, 142, 143, 144, 145, 176, 177, 178, 179, 180, 181, 217, 218, 219, 220, 221]FalseTrueTrueFalseFalseFalseFalseFalseFalseTrueFalse
2995012fdebdb224452a8666eea8ea86d35bO Nosso Presidente Tem Mesmo Uma Paciência De Jó, Porque se Fosse Eu, Já Tinha Mandado Todos Estes Pilantras Trapaceiros Que Defende a Esquerda Para a Casa Do Caralho !OFFTINGRP[99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 156, 157, 159, 160, 161, 162, 163, 164, 165, 166]FalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalse