文章利用了贝叶斯模型,调用了blast和muscle来对OTU进行taxonomy assignment。
Query id
Subject id% identityalignment lengthmismatchesgap openingsq. startq. ends. starts. ende-valuebit score这几个名词必须理解深刻!
Sequence identity is the amount of characters which match exactly between two different sequences. Hereby, gaps are not counted and the measurement is relational to the shorter of the two sequences. This has the effect that sequence identity is not transitive, i.e. if sequence A=B and B=C then A is not necessarily equal C (in terms of the identity distance measure) :
B: AAGGCC:AAGGCATHere identity(A,B)=100% (5 identical nucleotides / min(length(A),length(B))).
Identity(B,C)=100%, but identity(A,C)=85% ((6 identical nucleotides / 7)).Similarity is the degree of resemblance between two sequences when they are compared. This is dependent on their identity. It shows the extent to which residues in aligned. Similar sequences have similar properties.
Sequence similarity is first of all a general description of a relationship but nevertheless its more or less common practice to define similarity as an optimal matching problem (for sequence alignments or unless defined otherwise). Hereby, the optimal matching algorithm finds the minimal number of edit operations (inserts, deletes, and substitutions) in order to align one sequence to the other sequence . Using this, the percentage sequence similarity of the examples above are sim(A,B)=60%, sim(B,C)=60%, sim(A,C)=86% .
The bit score gives an indication of how good the alignment is; the higher the score, the better the alignment. In general terms, this score is calculated from a formula that takes into account the alignment of similar or identical residues, as well as any gaps introduced to align the sequences.