Dataset statistics
Number of variables | 17 |
---|---|
Number of observations | 2996 |
Missing cells | 1400 |
Missing cells (%) | 2.7% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 2.7 MiB |
Average record size in memory | 939.2 B |
Variable types
Categorical | 5 |
---|---|
Unsupported | 1 |
Boolean | 11 |
religious_intolerance has constant value "False" | Constant |
id has a high cardinality: 2996 distinct values | High cardinality |
text has a high cardinality: 2996 distinct values | High cardinality |
is_offensive is highly correlated with targeted_type and 2 other fields | High correlation |
health is highly correlated with religious_intolerance | High correlation |
other_lifestyle is highly correlated with religious_intolerance | High correlation |
racism is highly correlated with religious_intolerance | High correlation |
sexism is highly correlated with religious_intolerance | High correlation |
is_targeted is highly correlated with targeted_type and 1 other fields | High correlation |
targeted_type is highly correlated with is_offensive and 2 other fields | High correlation |
profanity_obscene is highly correlated with religious_intolerance | High correlation |
insult is highly correlated with is_offensive and 1 other fields | High correlation |
religious_intolerance is highly correlated with is_offensive and 12 other fields | High correlation |
xenophobia is highly correlated with religious_intolerance | High correlation |
physical_aspects is highly correlated with religious_intolerance | High correlation |
lgbtqphobia is highly correlated with religious_intolerance | High correlation |
ideology is highly correlated with religious_intolerance | High correlation |
is_offensive is highly correlated with insult | High correlation |
insult is highly correlated with is_offensive | High correlation |
targeted_type has 1283 (42.8%) missing values | Missing |
toxic_spans has 117 (3.9%) missing values | Missing |
id is uniformly distributed | Uniform |
text is uniformly distributed | Uniform |
id has unique values | Unique |
text has unique values | Unique |
toxic_spans is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2022-09-01 22:12:40.318763 |
---|---|
Analysis finished | 2022-09-01 22:12:53.568420 |
Duration | 13.25 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 2996 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 260.5 KiB |
da19df36730945f08df3d09efa354876 | 1 |
---|---|
49bf28b765484b9f963d6885eb48df31 | 1 |
ee5a5d40987f424ab71dd60abf56a23c | 1 |
3ca64c7132d042feaa0eadea0e76ff22 | 1 |
e1bffd1c19be401393ab91e196839854 | 1 |
Other values (2991) |
Length
Max length | 32 |
---|---|
Median length | 32 |
Mean length | 32 |
Min length | 32 |
Characters and Unicode
Total characters | 95872 |
---|---|
Distinct characters | 16 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 2996 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | da19df36730945f08df3d09efa354876 |
---|---|
2nd row | 80f1a8c981864887b13963fed1261acc |
3rd row | 80eee9db811c4ea4b2ddb7863d12c5fe |
4th row | 2f67025f913e4a6292e3d000d9e2b5a8 |
5th row | cd92f539559e421ba61cf23ecd005511 |
Common Values
Value | Count | Frequency (%) |
da19df36730945f08df3d09efa354876 | 1 | < 0.1% |
49bf28b765484b9f963d6885eb48df31 | 1 | < 0.1% |
ee5a5d40987f424ab71dd60abf56a23c | 1 | < 0.1% |
3ca64c7132d042feaa0eadea0e76ff22 | 1 | < 0.1% |
e1bffd1c19be401393ab91e196839854 | 1 | < 0.1% |
4dab80b5088d4255a0f2372704246e4e | 1 | < 0.1% |
b564daa958ec4b7390c4674c298e72a4 | 1 | < 0.1% |
10978af756c94a479fbaf54a144c7052 | 1 | < 0.1% |
1dbe54562a824d42bd95a728c5d6afef | 1 | < 0.1% |
1a1b7b30ec08435889c578791b8188c3 | 1 | < 0.1% |
Other values (2986) | 2986 |
Length
Value | Count | Frequency (%) |
da19df36730945f08df3d09efa354876 | 1 | < 0.1% |
cc66b54eeec24607a67e2259134a1cdd | 1 | < 0.1% |
a223536974394b15b5e3bb658ebc596b | 1 | < 0.1% |
396d5049c82c430a9f4fa5694c00cbd2 | 1 | < 0.1% |
80eee9db811c4ea4b2ddb7863d12c5fe | 1 | < 0.1% |
2f67025f913e4a6292e3d000d9e2b5a8 | 1 | < 0.1% |
cd92f539559e421ba61cf23ecd005511 | 1 | < 0.1% |
430b13705cf34e13b74bc999425187c3 | 1 | < 0.1% |
c779826dc43f460cb18e8429ca443477 | 1 | < 0.1% |
e64148caa4474fc79298e01d0dda8f5e | 1 | < 0.1% |
Other values (2986) | 2986 |
Most occurring characters
Value | Count | Frequency (%) |
4 | 8805 | 9.2% |
9 | 6413 | 6.7% |
8 | 6392 | 6.7% |
b | 6381 | 6.7% |
a | 6270 | 6.5% |
2 | 5760 | 6.0% |
7 | 5663 | 5.9% |
0 | 5635 | 5.9% |
e | 5599 | 5.8% |
1 | 5590 | 5.8% |
Other values (6) | 33364 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 60956 | |
Lowercase Letter | 34916 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
4 | 8805 | |
9 | 6413 | |
8 | 6392 | |
2 | 5760 | |
7 | 5663 | |
0 | 5635 | |
1 | 5590 | |
3 | 5590 | |
6 | 5587 | |
5 | 5521 |
Lowercase Letter
Value | Count | Frequency (%) |
b | 6381 | |
a | 6270 | |
e | 5599 | |
f | 5583 | |
d | 5573 | |
c | 5510 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 60956 | |
Latin | 34916 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
4 | 8805 | |
9 | 6413 | |
8 | 6392 | |
2 | 5760 | |
7 | 5663 | |
0 | 5635 | |
1 | 5590 | |
3 | 5590 | |
6 | 5587 | |
5 | 5521 |
Latin
Value | Count | Frequency (%) |
b | 6381 | |
a | 6270 | |
e | 5599 | |
f | 5583 | |
d | 5573 | |
c | 5510 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 95872 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
4 | 8805 | 9.2% |
9 | 6413 | 6.7% |
8 | 6392 | 6.7% |
b | 6381 | 6.7% |
a | 6270 | 6.5% |
2 | 5760 | 6.0% |
7 | 5663 | 5.9% |
0 | 5635 | 5.9% |
e | 5599 | 5.8% |
1 | 5590 | 5.8% |
Other values (6) | 33364 |
Distinct | 2996 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 1.0 MiB |
USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs | 1 |
---|---|
6 DE PDL É PUTARIA!!! Chega por hoje URL | 1 |
Perto dela ele fica calado | 1 |
O que esperar de uma empresa que vende lixo para as pessoas comerem ? | 1 |
USER USER é uma hamburgueria não uma escola, pai que precisa de uma hamburgueria pra educar filho é um pai vagabundo ou não é pai. | 1 |
Other values (2991) |
Length
Max length | 954 |
---|---|
Median length | 476.5 |
Mean length | 125.6048064 |
Min length | 4 |
Characters and Unicode
Total characters | 376312 |
---|---|
Distinct characters | 318 |
Distinct categories | 21 ? |
Distinct scripts | 6 ? |
Distinct blocks | 18 ? |
Unique
Unique | 2996 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs |
---|---|
2nd row | Cara isso foi muito babaca geral USER conhece o Monark e as merda que ele fala. Isso é muito merda eu USER E to decepcionada pra caralho mas não cabe a mim dizer |
3rd row | Quem liga pra judeu kkkk |
4th row | Se vc for porco, folgado e relaxado, você não ia conseguir viver com ela mesmo. Realmente, gente escrota não ia conseguir conviver com a Jojo |
5th row | Rapaziada chata, né?! O cara trabalha c funk, vive no meio de mulher, solteiro ou não ele ia gravar o clip.... mas aí porque ficou solteiro “tá querendo se mostrar” pqp, gente chata p caralho |
Common Values
Value | Count | Frequency (%) |
USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs | 1 | < 0.1% |
6 DE PDL É PUTARIA!!! Chega por hoje URL | 1 | < 0.1% |
Perto dela ele fica calado | 1 | < 0.1% |
O que esperar de uma empresa que vende lixo para as pessoas comerem ? | 1 | < 0.1% |
USER USER é uma hamburgueria não uma escola, pai que precisa de uma hamburgueria pra educar filho é um pai vagabundo ou não é pai. | 1 | < 0.1% |
Que merda de insônia mos, quero dormir inferno | 1 | < 0.1% |
cara, dedo quente nesse cu gelado foi ótimo vei, ri pa caralho!!!! | 1 | < 0.1% |
caralho mano pq eu fui ler essa porra, vou jogar ácido no meu olho da próxima vez que eu inventar de fazer essa palhaçada | 1 | < 0.1% |
USER KKKKKKKKKK VAGABUNDA | 1 | < 0.1% |
USER USER USER percebeu a merda q USER falou? Liberdade de expressão não é tirar a liberdade de outra pessoa só porque USER quer. Pensa antes de escrever. | 1 | < 0.1% |
Other values (2986) | 2986 |
Length
Value | Count | Frequency (%) |
user | 3945 | 5.7% |
que | 2076 | 3.0% |
de | 1853 | 2.7% |
e | 1675 | 2.4% |
o | 1668 | 2.4% |
a | 1429 | 2.1% |
é | 1329 | 1.9% |
não | 972 | 1.4% |
um | 714 | 1.0% |
do | 668 | 1.0% |
Other values (10786) | 52615 |
Most occurring characters
Value | Count | Frequency (%) |
65948 | ||
a | 33194 | 8.8% |
e | 31584 | 8.4% |
o | 27632 | 7.3% |
s | 19623 | 5.2% |
r | 17519 | 4.7% |
i | 15455 | 4.1% |
n | 12862 | 3.4% |
d | 12852 | 3.4% |
m | 12482 | 3.3% |
Other values (308) | 127161 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 263150 | |
Space Separator | 65948 | 17.5% |
Uppercase Letter | 34911 | 9.3% |
Other Punctuation | 9976 | 2.7% |
Other Symbol | 961 | 0.3% |
Decimal Number | 718 | 0.2% |
Dash Punctuation | 113 | < 0.1% |
Close Punctuation | 96 | < 0.1% |
Open Punctuation | 88 | < 0.1% |
Math Symbol | 82 | < 0.1% |
Other values (11) | 269 | 0.1% |
Most frequent character per category
Other Symbol
Value | Count | Frequency (%) |
😂 | 141 | 14.7% |
🤣 | 110 | 11.4% |
👏 | 37 | 3.9% |
😭 | 35 | 3.6% |
🤮 | 27 | 2.8% |
🤦 | 24 | 2.5% |
🇧 | 20 | 2.1% |
🇷 | 20 | 2.1% |
😍 | 17 | 1.8% |
😡 | 17 | 1.8% |
Other values (144) | 513 |
Lowercase Letter
Value | Count | Frequency (%) |
a | 33194 | |
e | 31584 | |
o | 27632 | |
s | 19623 | 7.5% |
r | 17519 | 6.7% |
i | 15455 | 5.9% |
n | 12862 | 4.9% |
d | 12852 | 4.9% |
m | 12482 | 4.7% |
u | 11849 | 4.5% |
Other values (33) | 68098 |
Uppercase Letter
Value | Count | Frequency (%) |
E | 6049 | |
S | 5441 | |
R | 5406 | |
U | 5170 | |
A | 2061 | 5.9% |
O | 1477 | 4.2% |
T | 895 | 2.6% |
N | 829 | 2.4% |
M | 795 | 2.3% |
D | 773 | 2.2% |
Other values (28) | 6015 |
Other Letter
Value | Count | Frequency (%) |
茅 | 5 | |
茫 | 4 | 10.3% |
锚 | 4 | 10.3% |
谩 | 4 | 10.3% |
馃 | 3 | 7.7% |
º | 2 | 5.1% |
莽 | 2 | 5.1% |
鉁 | 2 | 5.1% |
檮 | 1 | 2.6% |
ツ | 1 | 2.6% |
Other values (11) | 11 |
Other Punctuation
Value | Count | Frequency (%) |
. | 3696 | |
, | 3054 | |
! | 1555 | |
? | 661 | 6.6% |
" | 406 | 4.1% |
: | 252 | 2.5% |
' | 97 | 1.0% |
… | 84 | 0.8% |
* | 68 | 0.7% |
/ | 44 | 0.4% |
Other values (8) | 59 | 0.6% |
Decimal Number
Value | Count | Frequency (%) |
0 | 153 | |
2 | 111 | |
1 | 110 | |
3 | 88 | |
4 | 65 | |
6 | 49 | 6.8% |
5 | 47 | 6.5% |
8 | 34 | 4.7% |
9 | 31 | 4.3% |
7 | 30 | 4.2% |
Math Symbol
Value | Count | Frequency (%) |
> | 47 | |
= | 13 | 15.9% |
+ | 13 | 15.9% |
< | 3 | 3.7% |
~ | 3 | 3.7% |
¬ | 2 | 2.4% |
| | 1 | 1.2% |
Modifier Symbol
Value | Count | Frequency (%) |
🏼 | 23 | |
🏻 | 14 | |
🏽 | 13 | |
🏾 | 9 | 12.2% |
🏿 | 8 | 10.8% |
^ | 5 | 6.8% |
´ | 2 | 2.7% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 108 | |
— | 5 | 4.4% |
Close Punctuation
Value | Count | Frequency (%) |
) | 90 | |
] | 6 | 6.2% |
Open Punctuation
Value | Count | Frequency (%) |
( | 81 | |
[ | 7 | 8.0% |
Nonspacing Mark
Value | Count | Frequency (%) |
️ | 47 | |
͜ | 1 | 2.1% |
Format
Value | Count | Frequency (%) |
| 30 | |
| 1 | 3.2% |
Final Punctuation
Value | Count | Frequency (%) |
” | 27 | |
’ | 2 | 6.9% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 7 | |
﹏ | 1 | 12.5% |
Space Separator
Value | Count | Frequency (%) |
65948 |
Initial Punctuation
Value | Count | Frequency (%) |
“ | 31 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 4 |
Control
Value | Count | Frequency (%) |
2 |
Enclosing Mark
Value | Count | Frequency (%) |
⃣ | 2 |
Modifier Letter
Value | Count | Frequency (%) |
ー | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 298064 | |
Common | 78132 | 20.8% |
Inherited | 80 | < 0.1% |
Han | 31 | < 0.1% |
Katakana | 4 | < 0.1% |
Hiragana | 1 | < 0.1% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
65948 | ||
. | 3696 | 4.7% |
, | 3054 | 3.9% |
! | 1555 | 2.0% |
? | 661 | 0.8% |
" | 406 | 0.5% |
: | 252 | 0.3% |
0 | 153 | 0.2% |
😂 | 141 | 0.2% |
2 | 111 | 0.1% |
Other values (202) | 2155 | 2.8% |
Latin
Value | Count | Frequency (%) |
a | 33194 | 11.1% |
e | 31584 | 10.6% |
o | 27632 | 9.3% |
s | 19623 | 6.6% |
r | 17519 | 5.9% |
i | 15455 | 5.2% |
n | 12862 | 4.3% |
d | 12852 | 4.3% |
m | 12482 | 4.2% |
u | 11849 | 4.0% |
Other values (73) | 103012 |
Han
Value | Count | Frequency (%) |
茅 | 5 | |
茫 | 4 | |
锚 | 4 | |
谩 | 4 | |
馃 | 3 | |
莽 | 2 | 6.5% |
鉁 | 2 | 6.5% |
檮 | 1 | 3.2% |
芒 | 1 | 3.2% |
脭 | 1 | 3.2% |
Other values (4) | 4 |
Inherited
Value | Count | Frequency (%) |
️ | 47 | |
| 30 | |
⃣ | 2 | 2.5% |
͜ | 1 | 1.2% |
Katakana
Value | Count | Frequency (%) |
ツ | 1 | |
イ | 1 | |
メ | 1 | |
ジ | 1 |
Hiragana
Value | Count | Frequency (%) |
の | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 366689 | |
None | 8876 | 2.4% |
Emoticons | 352 | 0.1% |
Punctuation | 179 | < 0.1% |
VS | 47 | < 0.1% |
Enclosed Alphanum Sup | 40 | < 0.1% |
Misc Symbols | 38 | < 0.1% |
Dingbats | 33 | < 0.1% |
CJK | 31 | < 0.1% |
Geometric Shapes Ext | 12 | < 0.1% |
Other values (8) | 15 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
65948 | ||
a | 33194 | 9.1% |
e | 31584 | 8.6% |
o | 27632 | 7.5% |
s | 19623 | 5.4% |
r | 17519 | 4.8% |
i | 15455 | 4.2% |
n | 12862 | 3.5% |
d | 12852 | 3.5% |
m | 12482 | 3.4% |
Other values (82) | 117538 |
None
Value | Count | Frequency (%) |
ã | 1978 | |
é | 1799 | |
á | 1017 | |
ç | 848 | |
ó | 684 | 7.7% |
í | 675 | 7.6% |
ê | 440 | 5.0% |
ú | 204 | 2.3% |
É | 155 | 1.7% |
🤣 | 110 | 1.2% |
Other values (117) | 966 |
Emoticons
Value | Count | Frequency (%) |
😂 | 141 | |
😭 | 35 | 9.9% |
😍 | 17 | 4.8% |
😡 | 17 | 4.8% |
😠 | 14 | 4.0% |
😒 | 13 | 3.7% |
😆 | 11 | 3.1% |
😤 | 9 | 2.6% |
😢 | 8 | 2.3% |
🙄 | 6 | 1.7% |
Other values (35) | 81 |
Punctuation
Value | Count | Frequency (%) |
… | 84 | |
“ | 31 | 17.3% |
| 30 | 16.8% |
” | 27 | 15.1% |
— | 5 | 2.8% |
’ | 2 | 1.1% |
VS
Value | Count | Frequency (%) |
️ | 47 |
Enclosed Alphanum Sup
Value | Count | Frequency (%) |
🇧 | 20 | |
🇷 | 20 |
Misc Symbols
Value | Count | Frequency (%) |
♂ | 14 | |
♀ | 12 | |
♡ | 4 | 10.5% |
☠ | 2 | 5.3% |
☺ | 2 | 5.3% |
♥ | 2 | 5.3% |
☄ | 1 | 2.6% |
⚖ | 1 | 2.6% |
Dingbats
Value | Count | Frequency (%) |
❤ | 14 | |
✅ | 8 | |
✌ | 5 | 15.2% |
✨ | 2 | 6.1% |
❗ | 1 | 3.0% |
❌ | 1 | 3.0% |
❞ | 1 | 3.0% |
❝ | 1 | 3.0% |
Geometric Shapes Ext
Value | Count | Frequency (%) |
🟩 | 10 | |
🟨 | 2 | 16.7% |
CJK
Value | Count | Frequency (%) |
茅 | 5 | |
茫 | 4 | |
锚 | 4 | |
谩 | 4 | |
馃 | 3 | |
莽 | 2 | 6.5% |
鉁 | 2 | 6.5% |
檮 | 1 | 3.2% |
芒 | 1 | 3.2% |
脭 | 1 | 3.2% |
Other values (4) | 4 |
Box Drawing
Value | Count | Frequency (%) |
╥ | 2 |
Specials
Value | Count | Frequency (%) |
� | 2 |
Geometric Shapes
Value | Count | Frequency (%) |
● | 1 | |
○ | 1 |
Katakana
Value | Count | Frequency (%) |
ツ | 1 | |
イ | 1 | |
メ | 1 | |
ジ | 1 | |
ー | 1 |
Hiragana
Value | Count | Frequency (%) |
の | 1 |
IPA Ext
Value | Count | Frequency (%) |
ʖ | 1 |
Diacriticals
Value | Count | Frequency (%) |
͜ | 1 |
CJK Compat Forms
Value | Count | Frequency (%) |
﹏ | 1 |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 175.7 KiB |
OFF | |
---|---|
NOT | 117 |
Common Values
Value | Count | Frequency (%) |
OFF | 2879 | |
NOT | 117 | 3.9% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
off | 2879 | |
not | 117 | 3.9% |
Most occurring characters
Value | Count | Frequency (%) |
F | 5758 | |
O | 2996 | |
N | 117 | 1.3% |
T | 117 | 1.3% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 8988 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
F | 5758 | |
O | 2996 | |
N | 117 | 1.3% |
T | 117 | 1.3% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 8988 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
F | 5758 | |
O | 2996 | |
N | 117 | 1.3% |
T | 117 | 1.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 8988 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
F | 5758 | |
O | 2996 | |
N | 117 | 1.3% |
T | 117 | 1.3% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 175.7 KiB |
TIN | |
---|---|
UNT |
Common Values
Value | Count | Frequency (%) |
TIN | 1713 | |
UNT | 1283 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
tin | 1713 | |
unt | 1283 |
Most occurring characters
Value | Count | Frequency (%) |
T | 2996 | |
N | 2996 | |
I | 1713 | |
U | 1283 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 8988 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
T | 2996 | |
N | 2996 | |
I | 1713 | |
U | 1283 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 8988 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
T | 2996 | |
N | 2996 | |
I | 1713 | |
U | 1283 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 8988 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
T | 2996 | |
N | 2996 | |
I | 1713 | |
U | 1283 |
Distinct | 3 |
---|---|
Distinct (%) | 0.2% |
Missing | 1283 |
Missing (%) | 42.8% |
Memory size | 140.6 KiB |
IND | |
---|---|
GRP | |
OTH | 92 |
Common Values
Value | Count | Frequency (%) |
IND | 1047 | |
GRP | 574 | |
OTH | 92 | 3.1% |
(Missing) | 1283 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
ind | 1047 | |
grp | 574 | |
oth | 92 | 5.4% |
Most occurring characters
Value | Count | Frequency (%) |
I | 1047 | |
N | 1047 | |
D | 1047 | |
G | 574 | |
R | 574 | |
P | 574 | |
O | 92 | 1.8% |
T | 92 | 1.8% |
H | 92 | 1.8% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 5139 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
I | 1047 | |
N | 1047 | |
D | 1047 | |
G | 574 | |
R | 574 | |
P | 574 | |
O | 92 | 1.8% |
T | 92 | 1.8% |
H | 92 | 1.8% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5139 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
I | 1047 | |
N | 1047 | |
D | 1047 | |
G | 574 | |
R | 574 | |
P | 574 | |
O | 92 | 1.8% |
T | 92 | 1.8% |
H | 92 | 1.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5139 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
I | 1047 | |
N | 1047 | |
D | 1047 | |
G | 574 | |
R | 574 | |
P | 574 | |
O | 92 | 1.8% |
T | 92 | 1.8% |
H | 92 | 1.8% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True | 71 |
Value | Count | Frequency (%) |
False | 2925 | |
True | 71 | 2.4% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 2220 | |
True | 776 | 25.9% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
True | |
---|---|
False | 174 |
Value | Count | Frequency (%) |
True | 2822 | |
False | 174 | 5.8% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True | 213 |
Value | Count | Frequency (%) |
False | 2783 | |
True | 213 | 7.1% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True | 41 |
Value | Count | Frequency (%) |
False | 2955 | |
True | 41 | 1.4% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True | 178 |
Value | Count | Frequency (%) |
False | 2818 | |
True | 178 | 5.9% |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 2064 | |
True | 932 |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True | 77 |
Value | Count | Frequency (%) |
False | 2919 | |
True | 77 | 2.6% |
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False |
---|
Value | Count | Frequency (%) |
False | 2996 |
Distinct | 2 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.1 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 2616 | |
True | 380 | 12.7% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
id | text | is_offensive | is_targeted | targeted_type | toxic_spans | health | ideology | insult | lgbtqphobia | other_lifestyle | physical_aspects | profanity_obscene | racism | religious_intolerance | sexism | xenophobia | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | da19df36730945f08df3d09efa354876 | USER Adorei o comercial também Jesus. Só achei que faltou um beijinho gay estilo Jesus e USER sabe?! Rsrsrs | OFF | UNT | None | [52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86] | False | False | True | False | False | False | True | False | False | False | False |
1 | 80f1a8c981864887b13963fed1261acc | Cara isso foi muito babaca geral USER conhece o Monark e as merda que ele fala. Isso é muito merda eu USER E to decepcionada pra caralho mas não cabe a mim dizer | OFF | TIN | GRP | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105] | False | False | True | False | False | False | False | False | False | False | False |
2 | 80eee9db811c4ea4b2ddb7863d12c5fe | Quem liga pra judeu kkkk | OFF | UNT | None | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] | False | True | True | False | False | False | False | False | False | False | True |
3 | 2f67025f913e4a6292e3d000d9e2b5a8 | Se vc for porco, folgado e relaxado, você não ia conseguir viver com ela mesmo. Realmente, gente escrota não ia conseguir conviver com a Jojo | OFF | UNT | None | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104] | False | False | True | False | False | False | False | False | False | False | False |
4 | cd92f539559e421ba61cf23ecd005511 | Rapaziada chata, né?! O cara trabalha c funk, vive no meio de mulher, solteiro ou não ele ia gravar o clip.... mas aí porque ficou solteiro “tá querendo se mostrar” pqp, gente chata p caralho | OFF | TIN | GRP | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 165, 166, 167, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190] | False | False | True | False | False | False | False | False | False | False | False |
5 | 430b13705cf34e13b74bc999425187c3 | USER USER é muito bom. USER ^^ E claro a equipe USER. | NOT | UNT | None | None | False | False | False | False | False | False | False | False | False | False | False |
6 | c779826dc43f460cb18e8429ca443477 | Pior do que adolescentezinhas de merda...são pessoas que levam filmes tão a sério! O livro/filme é dela, ela faz o que quiser! E por mais ruim que seja, ta rendendo milhões (: | OFF | UNT | None | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38] | False | False | True | False | False | False | True | False | False | True | False |
7 | e64148caa4474fc79298e01d0dda8f5e | USER Toma no cu é vitamina como tu e tua prima. | OFF | TIN | GRP | [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46] | False | False | True | False | False | False | True | False | False | False | False |
8 | 738ccd4476784f47af3a5a6cfdda4695 | Se fosse um sniper ia ser louco | OFF | UNT | None | [26, 27, 28, 29, 30] | False | False | True | False | False | True | False | False | False | False | False |
9 | cc66b54eeec24607a67e2259134a1cdd | Muito bom, pena a circunstâncias serem ruins, ele poderia ter vindo antes, até para dar uma aula ao Monark e fazer ele parar de falar tanta besteira... | OFF | UNT | None | [119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150] | False | False | True | False | False | False | False | False | False | False | False |
Last rows
id | text | is_offensive | is_targeted | targeted_type | toxic_spans | health | ideology | insult | lgbtqphobia | other_lifestyle | physical_aspects | profanity_obscene | racism | religious_intolerance | sexism | xenophobia | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2986 | 358dd843965f429f9550f5a5eb41a46b | RT USER: TODO MUNDO NESSE CARALHO DE REDE SOCIAL TEM UM SQUAD UMA PANELINHA UM GRUPINHO E OS CARALHO A4 L, SÓ EU QUE NÃO QUE ODIO QUE PO… | OFF | TIN | GRP | [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102] | False | False | True | False | False | False | True | False | False | False | False |
2987 | 1d8a7915a93847d9abc67d2564926920 | Mano o povo gosta que seja ignorante , por isso eu mando pra casa do caralho mesmo 🤬 | OFF | TIN | GRP | [27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76] | False | True | True | False | False | False | True | False | False | False | True |
2988 | 7545fcbcd9f5497987639797d2b19518 | Essa mulher é louca.kkkkkkkkk. | OFF | TIN | IND | [14, 15, 16, 17, 18] | False | False | True | False | False | False | False | False | False | True | False |
2989 | 8849dc11cdde4bcebf2b361e905811fd | USER Ladrão desgracado dos infernos. Voce é um câncer que faz mal ao pais. URL | OFF | TIN | IND | [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 27, 28, 29, 30, 31, 32, 33, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] | True | True | True | False | False | False | False | False | False | False | False |
2990 | 78f5241468a54171893914852b0c6cce | Nós vamos colocar pra fuder cm esses bandidos do STF USER e senado | OFF | TIN | GRP | [22, 23, 24, 25, 26, 27, 37, 38, 39, 40, 41, 42, 43, 44, 45] | False | True | True | False | False | False | True | False | False | False | False |
2991 | 0315aa7b60dd48be9c5f47028422af13 | USER USER EU SO BURRA PERDÃO VO APAGA NGM AVISA NADA NESSE CARAIO TBM | OFF | TIN | IND | [16, 17, 18, 19, 20, 21, 59, 60, 61, 62, 63, 64, 65] | False | True | True | False | False | False | False | False | False | False | False |
2992 | 57be31ef8fd54e96afa8482f08df8987 | RT USER: POLÍTICOS CORRUTOS DEVERIAM APODRECER NA CADEIA. POR CAUSA DELES MORREM MILHARES DE PESSOAS POR FALTA DE RECURSOS. ELES… | OFF | TIN | GRP | [22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58] | False | True | True | False | False | False | False | False | False | False | False |
2993 | dfbffbb11afa4d28a745282c422314d4 | USER é sem personalidade, fala tudo ensaiado… parece um robô. | OFF | TIN | IND | [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59] | False | False | True | False | False | False | False | False | False | False | False |
2994 | d0e31661ac2e402b8d139b5efcd5c07c | FICO ME PERGUNTANDO Q MERDA A PESSOA TEM NA CABEÇA PRA VOLTAR PRA UM RELACIONAMENTO FUDIDO? PQP EU FICO INDIGNADA COM O TANTO Q EXISTE MINA TROUXA QUERIA Q TODAS ENTENDESSEM A MERDA DE RELACIONAMENTO Q TÃO E A PESSOA LIXO DO LADO DELAS | OFF | TIN | GRP | [22, 23, 24, 25, 26, 27, 84, 85, 86, 87, 88, 89, 92, 93, 94, 140, 141, 142, 143, 144, 145, 176, 177, 178, 179, 180, 181, 217, 218, 219, 220, 221] | False | True | True | False | False | False | False | False | False | True | False |
2995 | 012fdebdb224452a8666eea8ea86d35b | O Nosso Presidente Tem Mesmo Uma Paciência De Jó, Porque se Fosse Eu, Já Tinha Mandado Todos Estes Pilantras Trapaceiros Que Defende a Esquerda Para a Casa Do Caralho ! | OFF | TIN | GRP | [99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 156, 157, 159, 160, 161, 162, 163, 164, 165, 166] | False | True | False | False | False | False | False | False | False | False | False |