This information can be used by an algorithm called BLAST, which finds all proteins that are similar to each other and groups them together into “clusters”.
From here, we can assume that the sequences in each cluster evolved together. Once this is done, we can compare clusters to find out which one contains the most functional genes.
How are homologous features used in phylogenetic tree construction?
We want our protein sequence to be grouped with its “most closely related proteins”. This means that we want to find the cluster of proteins that is our “closest evolutionary relative”. To do this, we use another algorithm called parsimony, which calculates a score matrix and uses it to determine if our protein sequence should be closer to one cluster over the other.
If our protein sequence ends up in the wrong cluster, we can use another algorithm called “distance matrix”, which compares our protein to all of the other proteins in each cluster to find the one with the highest score. This is why it’s important that homologous features are used correctly, because this will prevent us from having to do multiple distance matrices for every individual cluster.
Which feature is phylogenetic analysis not able to determine?
Phylogenetic analysis is not able to determine the function of a protein.
What are some common pitfalls in phylogenetic analyses?
A common pitfall in phylogenetic analyses is when homologous features are counted multiple times, which causes us to incorrectly cluster our proteins into the wrong group.
Another pitfall is when homologous features are skipped because of either gaps in sequence or sequencing error, which causes us to incorrectly cluster our proteins into the wrong group. This can be repaired by using “gapped local alignment”, which uses an algorithm called “overlap-layout-consensus”.
What are some other pitfalls in phylogenetic analysis?
It’s also possible that two proteins may not be found to be homologous simply because they’re too distantly related. This can happen if there isn’t enough evolutionary time for them to evolve together and their sequence size is significantly different.
To alleviate this issue, we can increase the search parameters (e.g. change the number of BLAST iterations).