标签:
1) 若直接以20种氨基酸统计k_word: (以ZD98数据集为例)
| k | Dimension |
| 2 | 400 |
| 3 | 6490 |
| 4 | 22265 |
维数太大不适用构造特征向量
考虑氨基酸约化后特征提取
约化方案:
| Classification | Abbreviation | Abbreviation |
| Strongly hydrophilic or polar | L | R, D, E, N, Q, K, H |
| Strongly hydrophobic | B | L, I, V, A, M, F |
| Weakly hydrophilic or weakly hydrophobic | W | S, T, Y, W |
| Proline | P | P |
| Glycine | G | G |
| Cysteine | C | C |
约化后的特征
| k | dimension |
| 2 | 36 |
| 3 | 211 |
| 4 | 1071 |
| 5 | 3732 |
| 6 | 8698 |
| 7 | 14620 |
标签:
原文地址:http://www.cnblogs.com/cjbourne/p/4772316.html