博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
生信算法实践
阅读量:6316 次
发布时间:2019-06-22

本文共 1891 字,大约阅读时间需要 6 分钟。

最近在搞16S,发现了一个实践算法的最佳机会。

见文章:

文章利用了贝叶斯模型,调用了blast和muscle来对OTU进行taxonomy assignment。

可以看一下源代码,非常简单。

如果你能彻底理解本文的算法,并能看懂其源码,那你基本就打到了生信算法入门的水平。

说不定以后你也能随手发一个算法的文章哦!


BLAST

Query id

Subject id
% identity
alignment length
mismatches
gap openings
q. start
q. end
s. start
s. end
e-value
bit score

这几个名词必须理解深刻!

Sequence identity is the amount of characters which match exactly between two different sequences. Hereby, gaps are not counted and the measurement is relational to the shorter of the two sequences. This has the effect that sequence identity is not transitive, i.e. if sequence A=B and B=C then A is not necessarily equal C (in terms of the identity distance measure) :

A: AAGGCTT

B: AAGGC
C:AAGGCAT

Here identity(A,B)=100% (5 identical nucleotides / min(length(A),length(B))).

Identity(B,C)=100%, but identity(A,C)=85% ((6 identical nucleotides / 7)).

Similarity is the degree of resemblance between two sequences when they are compared. This is dependent on their identity. It shows the extent to which residues in aligned. Similar sequences have similar properties.

Sequence similarity is first of all a general description of a relationship but nevertheless its more or less common practice to define similarity as an optimal matching problem (for sequence alignments or unless defined otherwise). Hereby, the optimal matching algorithm finds the minimal number of edit operations (inserts, deletes, and substitutions) in order to align one sequence to the other sequence . Using this, the percentage sequence similarity of the examples above are sim(A,B)=60%, sim(B,C)=60%, sim(A,C)=86% .

The bit score gives an indication of how good the alignment is; the higher the score, the better the alignment. In general terms, this score is calculated from a formula that takes into account the alignment of similar or identical residues, as well as any gaps introduced to align the sequences.

转载地址:http://iekaa.baihongyu.com/

你可能感兴趣的文章
html5 canvas 实现一个简单的叮当猫头部
查看>>
HDOJ/HDU 2564 词组缩写(单词缩写)
查看>>
Java的Log系统介绍和切换(转)
查看>>
谈谈MVC项目中的缓存功能设计的相关问题
查看>>
【百度地图API】如何进行地址解析与反地址解析?——模糊地址能搜索到精确地理信息!...
查看>>
[20170117]添加检索引擎.txt
查看>>
代理商变操盘手:在线旅游到底玩啥花样
查看>>
Sublimetext 3 常用插件
查看>>
利用内存破坏实现Python沙盒逃逸
查看>>
来自麻省理工的信息抽取
查看>>
.Net中EF针对大数据量查询超时的一种优化
查看>>
使用强类型数据集进行有效编码——转载
查看>>
Android——简单对话框
查看>>
LINUX系统中动态链接库的创建与使用{补充}
查看>>
三维视觉国际会议首度在中国举办
查看>>
自动驾驶的出现,是否会给美国卡车司机彻底“宣判死刑”?
查看>>
达索系统入手XFlow开发商 强化3DEXPERIENCE平台的仿真能力
查看>>
大数据行业图谱之三:为什么大数据应用公司这么贵?
查看>>
超融合技术工作负载优化技巧
查看>>
微信公众平台身份验证出问题 回应称开发哥正在修复
查看>>