Overview

Dataset statistics

Number of variables17
Number of observations1995
Missing cells1121
Missing cells (%)3.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory786.0 B

Variable types

Categorical5
Unsupported1
Boolean11

Alerts

religious_intolerance has constant value "False" Constant
id has a high cardinality: 1995 distinct values High cardinality
text has a high cardinality: 1995 distinct values High cardinality
lgbtqphobia is highly correlated with religious_intoleranceHigh correlation
religious_intolerance is highly correlated with lgbtqphobia and 12 other fieldsHigh correlation
other_lifestyle is highly correlated with religious_intoleranceHigh correlation
physical_aspects is highly correlated with religious_intoleranceHigh correlation
racism is highly correlated with religious_intoleranceHigh correlation
targeted_type is highly correlated with religious_intolerance and 2 other fieldsHigh correlation
health is highly correlated with religious_intoleranceHigh correlation
sexism is highly correlated with religious_intoleranceHigh correlation
xenophobia is highly correlated with religious_intoleranceHigh correlation
is_offensive is highly correlated with religious_intolerance and 3 other fieldsHigh correlation
profanity_obscene is highly correlated with religious_intoleranceHigh correlation
insult is highly correlated with religious_intolerance and 2 other fieldsHigh correlation
is_targeted is highly correlated with religious_intolerance and 3 other fieldsHigh correlation
ideology is highly correlated with religious_intoleranceHigh correlation
is_offensive is highly correlated with is_targeted and 1 other fieldsHigh correlation
is_targeted is highly correlated with is_offensive and 1 other fieldsHigh correlation
health is highly correlated with physical_aspectsHigh correlation
insult is highly correlated with is_offensive and 1 other fieldsHigh correlation
physical_aspects is highly correlated with healthHigh correlation
targeted_type has 812 (40.7%) missing values Missing
toxic_spans has 309 (15.5%) missing values Missing
id is uniformly distributed Uniform
text is uniformly distributed Uniform
id has unique values Unique
text has unique values Unique
toxic_spans is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-10-06 01:26:24.675422
Analysis finished2022-10-06 01:26:42.600230
Duration17.92 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1995
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size173.5 KiB
ccc932bedff24efd91ade52c236d554b
 
1
7b775f36cfc9439292544d721b808b21
 
1
17efc32abf784dc79de14f0c5bf0284f
 
1
cf1d3e6a413046a9b614af81aea77605
 
1
5f74b203e7d949a382f354e6db5b076b
 
1
Other values (1990)
1990 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters63840
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1995 ?
Unique (%)100.0%

Sample

1st rowccc932bedff24efd91ade52c236d554b
2nd row1e1fa5c2011f4b79ab8fa17d311abc8d
3rd row847d23e970d84e3ca687911ec2791af7
4th row8f4e011d21ea4ed6ab1702a2acf43c0b
5th row08f67adbd9c54cd1a308b77ec733af77

Common Values

ValueCountFrequency (%)
ccc932bedff24efd91ade52c236d554b1
 
0.1%
7b775f36cfc9439292544d721b808b211
 
0.1%
17efc32abf784dc79de14f0c5bf0284f1
 
0.1%
cf1d3e6a413046a9b614af81aea776051
 
0.1%
5f74b203e7d949a382f354e6db5b076b1
 
0.1%
408f246bc57e43f89f45da052d9ec1b51
 
0.1%
df41d8deeccb4457ac650c1eca60bb531
 
0.1%
c6b3708ae6084b12b90c1ba11230a7c41
 
0.1%
b161e4c4660f41118b4f116cdeb295451
 
0.1%
6aa5a5b42fe5435790e131a2663201221
 
0.1%
Other values (1985)1985
99.5%

Length

2022-10-05T22:26:42.892759image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ccc932bedff24efd91ade52c236d554b1
 
0.1%
a6f9b242df2647d6a29be7820c7caad81
 
0.1%
8f4e011d21ea4ed6ab1702a2acf43c0b1
 
0.1%
08f67adbd9c54cd1a308b77ec733af771
 
0.1%
e9e09ac235d2427e84c28f2106d0d2a61
 
0.1%
327d36f100b44c8f84f2844a11336d841
 
0.1%
e67d642951dd42ed9da034a8765ba6c21
 
0.1%
f3726ce30fbb4260b20a02d85f0a42c31
 
0.1%
38b85624d15241dfa8f9eaab8ab5a5561
 
0.1%
86e3552f8ca8453f9bd8a97df9aa420d1
 
0.1%
Other values (1985)1985
99.5%

Most occurring characters

ValueCountFrequency (%)
45756
 
9.0%
84359
 
6.8%
b4231
 
6.6%
94225
 
6.6%
a4129
 
6.5%
03814
 
6.0%
63797
 
5.9%
d3790
 
5.9%
c3757
 
5.9%
23756
 
5.9%
Other values (6)22226
34.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number40503
63.4%
Lowercase Letter23337
36.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
45756
14.2%
84359
10.8%
94225
10.4%
03814
9.4%
63797
9.4%
23756
9.3%
73724
9.2%
53716
9.2%
13680
9.1%
33676
9.1%
Lowercase Letter
ValueCountFrequency (%)
b4231
18.1%
a4129
17.7%
d3790
16.2%
c3757
16.1%
e3719
15.9%
f3711
15.9%

Most occurring scripts

ValueCountFrequency (%)
Common40503
63.4%
Latin23337
36.6%

Most frequent character per script

Common
ValueCountFrequency (%)
45756
14.2%
84359
10.8%
94225
10.4%
03814
9.4%
63797
9.4%
23756
9.3%
73724
9.2%
53716
9.2%
13680
9.1%
33676
9.1%
Latin
ValueCountFrequency (%)
b4231
18.1%
a4129
17.7%
d3790
16.2%
c3757
16.1%
e3719
15.9%
f3711
15.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII63840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
45756
 
9.0%
84359
 
6.8%
b4231
 
6.6%
94225
 
6.6%
a4129
 
6.5%
03814
 
6.0%
63797
 
5.9%
d3790
 
5.9%
c3757
 
5.9%
23756
 
5.9%
Other values (6)22226
34.8%

text
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1995
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size706.0 KiB
USER USER USER pelo contrario, eu sou USER bolsonaro e USER USER, pelo seu xingamento já da pra perceber que não tem argumento, USER só é mais um ser humano que gosta de baba-ovo de outro ser-humano que é bolsonaro.
 
1
puta merda cara! cara deve ter sido horrivel pra USER assistir o filme todo pra poder explica-lo detalhadamente, pra quem odeia o filme, saber todo o conteúdo do inicio ao fim é meio estranho! até pq USER comprou até o livro... que irônico!, ou USER ta querendo ser sarcástico???
 
1
Cara que gente esquisita, tinha q ser virgem q webnamoram e fzm gf, e pqp minha filha o cara dava todos os sinais q era um babaca. Se ele insisitisse em continuar era só fzr um ghosting e fds URL
 
1
USER n sabia q vc era uma bichona lan
 
1
Somos, ainda, um país muito atrasado e aí está a prova: todas essas manifestações homofóbicas. O discurso é sempre o mesmo: "Não sou preconceituoso, mas...". Enquanto o mundo inteiro se move no sentido de entender que ninguém tem o direito de julgar a quem o outro pode amar, no Brasil, continuamos um povo burro, preguiçoso e cheio de uma moralidade torpe.
 
1
Other values (1990)
1990 

Length

Max length1084
Median length397
Mean length129.3077694
Min length8

Characters and Unicode

Total characters257969
Distinct characters267
Distinct categories18 ?
Distinct scripts4 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1995 ?
Unique (%)100.0%

Sample

1st rowUSER USER USER pelo contrario, eu sou USER bolsonaro e USER USER, pelo seu xingamento já da pra perceber que não tem argumento, USER só é mais um ser humano que gosta de baba-ovo de outro ser-humano que é bolsonaro.
2nd rowUSER e a favor do Nazismo cabe safado
3rd rowUSER USER USER USER USER A transposição foi inaugurada pelo Lula, Dilma, Temer e Bolsonaro. Mas pra vcs a obra é só do Genocida. Quero saber, no q o país melhorou pro gado tentar reeleger esse lixo?
4th rowUSER tens de continuar, eu pratico ginásio à uns 3 anos e contino-o o mesmo pau de virar tripas xddd
5th rowUSER USER NAO VOU APAGAR A PAGINA PORRAAAAAAA VAMOOOOOO TIMEEEE

Common Values

ValueCountFrequency (%)
USER USER USER pelo contrario, eu sou USER bolsonaro e USER USER, pelo seu xingamento já da pra perceber que não tem argumento, USER só é mais um ser humano que gosta de baba-ovo de outro ser-humano que é bolsonaro.1
 
0.1%
puta merda cara! cara deve ter sido horrivel pra USER assistir o filme todo pra poder explica-lo detalhadamente, pra quem odeia o filme, saber todo o conteúdo do inicio ao fim é meio estranho! até pq USER comprou até o livro... que irônico!, ou USER ta querendo ser sarcástico???1
 
0.1%
Cara que gente esquisita, tinha q ser virgem q webnamoram e fzm gf, e pqp minha filha o cara dava todos os sinais q era um babaca. Se ele insisitisse em continuar era só fzr um ghosting e fds URL1
 
0.1%
USER n sabia q vc era uma bichona lan1
 
0.1%
Somos, ainda, um país muito atrasado e aí está a prova: todas essas manifestações homofóbicas. O discurso é sempre o mesmo: "Não sou preconceituoso, mas...". Enquanto o mundo inteiro se move no sentido de entender que ninguém tem o direito de julgar a quem o outro pode amar, no Brasil, continuamos um povo burro, preguiçoso e cheio de uma moralidade torpe.1
 
0.1%
eu te odeio paralisia do sono eu te odeio1
 
0.1%
USER mimado USER USER1
 
0.1%
Adorei o video.Tu e USER veio...1
 
0.1%
Engraçado é que o convidado tbm falou bobagem! Mas USER teve a hombridade de assumir o seu erro! Tá pagando de vítima agora!1
 
0.1%
eu acho que nos temos que colocar nossas macara i ir pra rua mostra pra esse tirano que ele não ta com essa bola toda não porque se ficarmos só sentados reclamando não vai adianta não na cabeça doente dele ele acha que o USER ta ali e nos temos que mostra pra ele que ele não nos representa1
 
0.1%
Other values (1985)1985
99.5%

Length

2022-10-05T22:26:43.047795image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
user2659
 
5.7%
que1415
 
3.0%
de1260
 
2.7%
e1212
 
2.6%
o1201
 
2.6%
a951
 
2.0%
é935
 
2.0%
não719
 
1.5%
um489
 
1.0%
se465
 
1.0%
Other values (7993)35499
75.8%

Most occurring characters

ValueCountFrequency (%)
44810
17.4%
a23063
 
8.9%
e21692
 
8.4%
o19210
 
7.4%
s13515
 
5.2%
r12087
 
4.7%
i10638
 
4.1%
m8889
 
3.4%
d8843
 
3.4%
n8670
 
3.4%
Other values (257)86552
33.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter182050
70.6%
Space Separator44810
 
17.4%
Uppercase Letter22340
 
8.7%
Other Punctuation7121
 
2.8%
Other Symbol634
 
0.2%
Decimal Number543
 
0.2%
Dash Punctuation128
 
< 0.1%
Math Symbol81
 
< 0.1%
Close Punctuation69
 
< 0.1%
Open Punctuation64
 
< 0.1%
Other values (8)129
 
0.1%

Most frequent character per category

Other Symbol
ValueCountFrequency (%)
😂91
 
14.4%
🤣64
 
10.1%
👏24
 
3.8%
🤮23
 
3.6%
😡19
 
3.0%
🤦17
 
2.7%
🤡16
 
2.5%
😍14
 
2.2%
😭14
 
2.2%
💩14
 
2.2%
Other values (114)338
53.3%
Lowercase Letter
ValueCountFrequency (%)
a23063
12.7%
e21692
11.9%
o19210
10.6%
s13515
 
7.4%
r12087
 
6.6%
i10638
 
5.8%
m8889
 
4.9%
d8843
 
4.9%
n8670
 
4.8%
u8100
 
4.4%
Other values (31)47343
26.0%
Uppercase Letter
ValueCountFrequency (%)
E3980
17.8%
S3712
16.6%
R3588
16.1%
U3501
15.7%
A1223
 
5.5%
O874
 
3.9%
T494
 
2.2%
M479
 
2.1%
N437
 
2.0%
D412
 
1.8%
Other values (30)3640
16.3%
Other Punctuation
ValueCountFrequency (%)
.2787
39.1%
,2179
30.6%
!1085
 
15.2%
?463
 
6.5%
"234
 
3.3%
:126
 
1.8%
'73
 
1.0%
*60
 
0.8%
41
 
0.6%
/26
 
0.4%
Other values (5)47
 
0.7%
Decimal Number
ValueCountFrequency (%)
0114
21.0%
1102
18.8%
291
16.8%
367
12.3%
634
 
6.3%
832
 
5.9%
431
 
5.7%
529
 
5.3%
924
 
4.4%
719
 
3.5%
Other Letter
ValueCountFrequency (%)
2
14.3%
2
14.3%
º2
14.3%
2
14.3%
2
14.3%
1
7.1%
1
7.1%
1
7.1%
1
7.1%
Math Symbol
ValueCountFrequency (%)
>50
61.7%
=16
 
19.8%
+7
 
8.6%
<4
 
4.9%
|2
 
2.5%
¬2
 
2.5%
Modifier Symbol
ValueCountFrequency (%)
🏽15
51.7%
🏻4
 
13.8%
🏿3
 
10.3%
🏾3
 
10.3%
`2
 
6.9%
🏼2
 
6.9%
Close Punctuation
ValueCountFrequency (%)
)66
95.7%
]2
 
2.9%
1
 
1.4%
Dash Punctuation
ValueCountFrequency (%)
-124
96.9%
4
 
3.1%
Open Punctuation
ValueCountFrequency (%)
(62
96.9%
[2
 
3.1%
Initial Punctuation
ValueCountFrequency (%)
14
87.5%
2
 
12.5%
Final Punctuation
ValueCountFrequency (%)
9
81.8%
2
 
18.2%
Space Separator
ValueCountFrequency (%)
44810
100.0%
Nonspacing Mark
ValueCountFrequency (%)
24
100.0%
Format
ValueCountFrequency (%)
20
100.0%
Connector Punctuation
ValueCountFrequency (%)
_14
100.0%
Currency Symbol
ValueCountFrequency (%)
$1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin204392
79.2%
Common53521
 
20.7%
Inherited44
 
< 0.1%
Han12
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
44810
83.7%
.2787
 
5.2%
,2179
 
4.1%
!1085
 
2.0%
?463
 
0.9%
"234
 
0.4%
:126
 
0.2%
-124
 
0.2%
0114
 
0.2%
1102
 
0.2%
Other values (165)1497
 
2.8%
Latin
ValueCountFrequency (%)
a23063
 
11.3%
e21692
 
10.6%
o19210
 
9.4%
s13515
 
6.6%
r12087
 
5.9%
i10638
 
5.2%
m8889
 
4.3%
d8843
 
4.3%
n8670
 
4.2%
u8100
 
4.0%
Other values (72)69685
34.1%
Han
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
2
16.7%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
Inherited
ValueCountFrequency (%)
24
54.5%
20
45.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII251252
97.4%
None6267
 
2.4%
Emoticons249
 
0.1%
Punctuation93
 
< 0.1%
VS24
 
< 0.1%
Misc Symbols21
 
< 0.1%
Enclosed Alphanum Sup20
 
< 0.1%
Geometric Shapes Ext16
 
< 0.1%
Dingbats14
 
< 0.1%
CJK12
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
44810
17.8%
a23063
 
9.2%
e21692
 
8.6%
o19210
 
7.6%
s13515
 
5.4%
r12087
 
4.8%
i10638
 
4.2%
m8889
 
3.5%
d8843
 
3.5%
n8670
 
3.5%
Other values (79)79835
31.8%
None
ValueCountFrequency (%)
ã1592
25.4%
é1290
20.6%
á693
11.1%
ç636
 
10.1%
í460
 
7.3%
ó444
 
7.1%
ê315
 
5.0%
ú121
 
1.9%
É85
 
1.4%
🤣64
 
1.0%
Other values (100)567
 
9.0%
Emoticons
ValueCountFrequency (%)
😂91
36.5%
😡19
 
7.6%
😍14
 
5.6%
😭14
 
5.6%
😘12
 
4.8%
🙄9
 
3.6%
😠9
 
3.6%
😉7
 
2.8%
🙏7
 
2.8%
😅6
 
2.4%
Other values (28)61
24.5%
Punctuation
ValueCountFrequency (%)
41
44.1%
20
21.5%
14
 
15.1%
9
 
9.7%
4
 
4.3%
2
 
2.2%
2
 
2.2%
1
 
1.1%
VS
ValueCountFrequency (%)
24
100.0%
Dingbats
ValueCountFrequency (%)
13
92.9%
1
 
7.1%
Geometric Shapes Ext
ValueCountFrequency (%)
🟨10
62.5%
🟩6
37.5%
Enclosed Alphanum Sup
ValueCountFrequency (%)
🇧9
45.0%
🇷9
45.0%
🇱1
 
5.0%
🇮1
 
5.0%
Misc Symbols
ValueCountFrequency (%)
9
42.9%
8
38.1%
2
 
9.5%
2
 
9.5%
CJK
ValueCountFrequency (%)
2
16.7%
2
16.7%
2
16.7%
2
16.7%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
Letterlike Symbols
ValueCountFrequency (%)
1
100.0%

is_offensive
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size117.0 KiB
OFF
1686 
NOT
309 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5985
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOFF
2nd rowOFF
3rd rowOFF
4th rowOFF
5th rowOFF

Common Values

ValueCountFrequency (%)
OFF1686
84.5%
NOT309
 
15.5%

Length

2022-10-05T22:26:43.220800image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-05T22:26:43.374600image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
off1686
84.5%
not309
 
15.5%

Most occurring characters

ValueCountFrequency (%)
F3372
56.3%
O1995
33.3%
N309
 
5.2%
T309
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5985
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F3372
56.3%
O1995
33.3%
N309
 
5.2%
T309
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Latin5985
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F3372
56.3%
O1995
33.3%
N309
 
5.2%
T309
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5985
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F3372
56.3%
O1995
33.3%
N309
 
5.2%
T309
 
5.2%

is_targeted
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size117.0 KiB
TIN
1335 
UNT
660 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5985
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTIN
2nd rowTIN
3rd rowTIN
4th rowUNT
5th rowTIN

Common Values

ValueCountFrequency (%)
TIN1335
66.9%
UNT660
33.1%

Length

2022-10-05T22:26:43.577608image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-05T22:26:43.757616image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
tin1335
66.9%
unt660
33.1%

Most occurring characters

ValueCountFrequency (%)
T1995
33.3%
N1995
33.3%
I1335
22.3%
U660
 
11.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5985
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T1995
33.3%
N1995
33.3%
I1335
22.3%
U660
 
11.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5985
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T1995
33.3%
N1995
33.3%
I1335
22.3%
U660
 
11.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5985
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T1995
33.3%
N1995
33.3%
I1335
22.3%
U660
 
11.0%

targeted_type
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.3%
Missing812
Missing (%)40.7%
Memory size94.8 KiB
IND
751 
GRP
237 
OTH
195 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3549
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIND
2nd rowIND
3rd rowIND
4th rowIND
5th rowIND

Common Values

ValueCountFrequency (%)
IND751
37.6%
GRP237
 
11.9%
OTH195
 
9.8%
(Missing)812
40.7%

Length

2022-10-05T22:26:43.861620image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-05T22:26:44.004628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ind751
63.5%
grp237
 
20.0%
oth195
 
16.5%

Most occurring characters

ValueCountFrequency (%)
I751
21.2%
N751
21.2%
D751
21.2%
G237
 
6.7%
R237
 
6.7%
P237
 
6.7%
O195
 
5.5%
T195
 
5.5%
H195
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3549
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I751
21.2%
N751
21.2%
D751
21.2%
G237
 
6.7%
R237
 
6.7%
P237
 
6.7%
O195
 
5.5%
T195
 
5.5%
H195
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
Latin3549
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I751
21.2%
N751
21.2%
D751
21.2%
G237
 
6.7%
R237
 
6.7%
P237
 
6.7%
O195
 
5.5%
T195
 
5.5%
H195
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3549
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I751
21.2%
N751
21.2%
D751
21.2%
G237
 
6.7%
R237
 
6.7%
P237
 
6.7%
O195
 
5.5%
T195
 
5.5%
H195
 
5.5%

toxic_spans
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing309
Missing (%)15.5%
Memory size310.9 KiB

health
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1956 
True
 
39
ValueCountFrequency (%)
False1956
98.0%
True39
 
2.0%
2022-10-05T22:26:44.117630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

ideology
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1714 
True
281 
ValueCountFrequency (%)
False1714
85.9%
True281
 
14.1%
2022-10-05T22:26:44.226634image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

insult
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
True
1551 
False
444 
ValueCountFrequency (%)
True1551
77.7%
False444
 
22.3%
2022-10-05T22:26:44.336641image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

lgbtqphobia
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1904 
True
 
91
ValueCountFrequency (%)
False1904
95.4%
True91
 
4.6%
2022-10-05T22:26:44.454646image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

other_lifestyle
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1943 
True
 
52
ValueCountFrequency (%)
False1943
97.4%
True52
 
2.6%
2022-10-05T22:26:44.581667image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

physical_aspects
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1906 
True
 
89
ValueCountFrequency (%)
False1906
95.5%
True89
 
4.5%
2022-10-05T22:26:44.681698image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

profanity_obscene
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1257 
True
738 
ValueCountFrequency (%)
False1257
63.0%
True738
37.0%
2022-10-05T22:26:44.794659image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

racism
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1983 
True
 
12
ValueCountFrequency (%)
False1983
99.4%
True12
 
0.6%
2022-10-05T22:26:44.939667image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

religious_intolerance
Boolean

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1995 
ValueCountFrequency (%)
False1995
100.0%
2022-10-05T22:26:45.056674image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

sexism
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1926 
True
 
69
ValueCountFrequency (%)
False1926
96.5%
True69
 
3.5%
2022-10-05T22:26:45.165520image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

xenophobia
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
False
1976 
True
 
19
ValueCountFrequency (%)
False1976
99.0%
True19
 
1.0%
2022-10-05T22:26:45.291526image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-10-05T22:26:45.384185image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-05T22:26:45.600195image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-05T22:26:45.921207image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-05T22:26:46.126219image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-05T22:26:46.364575image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-05T22:26:41.632162image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-05T22:26:42.091359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-05T22:26:42.336370image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-10-05T22:26:42.460280image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idtextis_offensiveis_targetedtargeted_typetoxic_spanshealthideologyinsultlgbtqphobiaother_lifestylephysical_aspectsprofanity_obsceneracismreligious_intolerancesexismxenophobia
0ccc932bedff24efd91ade52c236d554bUSER USER USER pelo contrario, eu sou USER bolsonaro e USER USER, pelo seu xingamento já da pra perceber que não tem argumento, USER só é mais um ser humano que gosta de baba-ovo de outro ser-humano que é bolsonaro.OFFTININD[170, 171, 172, 173, 174, 175, 176, 177, 205, 206, 207, 208, 209, 210, 211, 212, 213]FalseTrueTrueFalseFalseFalseTrueFalseFalseFalseFalse
11e1fa5c2011f4b79ab8fa17d311abc8dUSER e a favor do Nazismo cabe safadoOFFTININD[31, 32, 33, 34, 35, 36]FalseTrueTrueFalseFalseFalseTrueFalseFalseFalseFalse
2847d23e970d84e3ca687911ec2791af7USER USER USER USER USER A transposição foi inaugurada pelo Lula, Dilma, Temer e Bolsonaro. Mas pra vcs a obra é só do Genocida. Quero saber, no q o país melhorou pro gado tentar reeleger esse lixo?OFFTININD[167, 168, 169, 170, 193, 194, 195, 196]FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
38f4e011d21ea4ed6ab1702a2acf43c0bUSER tens de continuar, eu pratico ginásio à uns 3 anos e contino-o o mesmo pau de virar tripas xdddOFFUNTNone[]FalseFalseTrueFalseFalseTrueTrueFalseFalseFalseFalse
408f67adbd9c54cd1a308b77ec733af77USER USER NAO VOU APAGAR A PAGINA PORRAAAAAAA VAMOOOOOO TIMEEEEOFFTINNone[34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
5e9e09ac235d2427e84c28f2106d0d2a6USER USER Tu sabe falar outra coisa além de desvio de caráter, seu lambe ovo de miliciano? Já levou sua mãe pra o Bozo comer o 👌🏼 ? Puxa saco do crlhOFFTININD[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 114, 115, 116, 117, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148]FalseTrueTrueFalseFalseFalseTrueFalseFalseFalseFalse
6327d36f100b44c8f84f2844a11336d84Esses idiotas falam que é normal morrer por corona, porque não foi eles e nem ninguém da família, se USER é justo muitas pessoa aí vai pegar corona vírus, se USER quiser!OFFTINNone[5, 6, 7, 8, 9, 10, 11, 12, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152]TrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
7e67d642951dd42ed9da034a8765ba6c2ou em pedem esse maluco ou estamos perdidos. só faz merdaOFFTININD[51, 52, 53, 54, 55, 56]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
8f3726ce30fbb4260b20a02d85f0a42c3USER o espírito de rico safadoOFFTININD[24, 25, 26, 27, 28, 29]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
938b85624d15241dfa8f9eaab8ab5a556Globo jornalista apresentador de TV ator atriz apoiador Adélio USER não vai ter aonde vocês roubar .por isso q esta lamentando .pode chorar avontade q não tem mais leite .OFFUNTNone[]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse

Last rows

idtextis_offensiveis_targetedtargeted_typetoxic_spanshealthideologyinsultlgbtqphobiaother_lifestylephysical_aspectsprofanity_obsceneracismreligious_intolerancesexismxenophobia
1985a0647638f0a6473e85d56a24c0f4ecdcEmpresa nojentaOFFTINOTH[8, 9, 10, 11, 12, 13, 14]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
198631b64791facf43a2b27139dfa187740fOlha que legal agora estamos chegando ao final da entrevistas percebendo que o professor entrou aí pra fazer campanha pro movimento lgbt né ! Só pra lembrar que esse movimento ODEIA ISRAEL 🇮🇱 E USER USER ,e amam os árabes que mandam pendurar gays no guindaste 🏗! USER o tanto que esse movimento tem força nos dias de hoje ! Atacam a família descaradamente !OFFUNTNone[]FalseFalseTrueTrueFalseFalseFalseFalseFalseFalseFalse
1987e9a20c214c154abbb738286417c79af9Ele fala tão bem que quase pegou os repórter USER USERNOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1988ca8fe154762749689a5ecc8ad6ff68cfUSER Ya deje de violar la ley y de hacerle el caldo gordo al bueno para nada . Póngase a trabajar y deje de grillar .OFFUNTNone[]FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1989f1f4ef44e2464384b7bc6fdf9e1660a8USER cara nojento do caralhoOFFTININD[10, 11, 12, 13, 14, 15, 16, 21, 22, 23, 24, 25, 26, 27]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
1990b410d2c87b984e4ab1b8925647d9825bUSER Novidade...facada fakeNOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
19910e3aeabd1cbb466cbfac54d39062a072Caguei se oque elas falam apoia quem sofreOFFTINNone[0, 1, 2, 3, 4, 5]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse
1992d00cca68919e45c2a76309ce8c9d55e8foram embora só agora da minha casa, bando de vagabundoOFFTINGRP[45, 46, 47, 48, 49, 50, 51, 52, 53, 54]FalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
19930e4dacc937ae456a973080bc3ae76eabParece que foi apenas uma fantasia sexual da parte dela. Ou ela é ninfomaníaca e estava com muito tesão. É difícil entender esse caso. Talvez o marido sabia dessa fantasia. Só que não aguentou presenciar a cena. Quem se deu mal e sofreu agressão, foi o mendingo. Agora, as especulações e as mentiras, são muitas.NOTUNTNoneNoneFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1994437560d73d2b4433839bc1b9831a1a22BANDO DE HOMOFOBICOS!!! Vou continuar comprando! inclusive ontem mesmo comprei esse povo USER sabe desrespeitar mas quando desrespeitam eles ai USER pode ne? hipocrisia enfia amo os lanches crtz q quem fala que vai compra em outro lugar...USER USER tem dinheiro ne???😁😂OFFUNTNone[]FalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalse