all right so i'll take a color and uh
i'm going to talk a little bit about
basic local alignment search tool now
blast and uh this is something whenever
you talk about bioinformatics blast is
something that we always
need to understand always need to
perform
practically as well so what is a full
form of blast blast t all caps
basic local alignment search tool okay
now basically it's a tool for searching
similarity which is developed by
ncbi okay ncbi developed blast is very
very popular tool to find out sequence
similarity okay so the sequence
similarity search tool you can say that
sequence
similarity
this is sequence similarity
search
tool
now how to search for sequence
similarity it can be either nucleotide
sequence or it can be so either
nucleotide sequence or amino acid that
means protein sequences and we can
search that sequence basically okay
we can search that sequence
so uh so in this case we have a sequence
so let's say the sequence that we use to
search throughout the all database under
ncbi that database is known as the query
query sequence
query sequence is the sequence
to be tested
tested means to be
searched okay to find for similarity and
then there is the whole database
whole database sequence database is
there to find the match
okay so whenever the match is there it's
like you know you are trying to get a
bioinformatics books for your
preparation i told you the name of the
book you have the book in your hand and
now you went to a library or a shop
where you want to get that same book so
what you will do you just show it to the
librarian and tell them to search the
whole library of books and to get that
same book out okay so what that
librarian will do the librarian will
follow follow a protocol right because
the library is filled with thousands of
books so the librarian will take your
book and first it will check what kind
of book it is whether it is of lower
study or higher study it will be a
higher study book because bioinformatics
is for higher studies so there is a
higher study section in the library so
now
the search is limited so from the whole
library now it's going to the higher
study now inside heart study what
subject it is bioinformatics so under
life sciences bioinformatics so
the person will now stick to the life
science portion where only the life
science books are there so now in the
like this will be even smaller than the
whole library then even smaller than the
total higher education book section and
now in the life science section
textbook section particularly the person
will search for all the different
subjects zoology botany physiology
bioinformatics biochemistry then the
person will find out uh you know a place
where the bioinformatics only
bioinformatics books are there and then
in that particular
let's say
uh storage unit of bioinformatics book
then the person will search for what
kind of book it is whether it's an
indian author book or a foreign author
book now let's say it's a foreign author
book then you go to the foreign author
section of the bioinformatics uh library
section and the foreign author book then
that person will search for that book
now the search becomes so organized when
we search it like this or now if you
randomly give this bioinformatics book
to someone who don't know about anything
about bioinfo haven't heard about it or
don't know anything about
it's not educated at all and you give
this book to the person the person will
now run and try to see all the books in
the library try to match the front cover
and you it will return your query so who
will take more time obviously the person
later will take more time and actually
it's not logical at all not scientific
at all so there should be a proper
approach this approach is known as
algorithm so when we talk about
algorithm algorithm is a process which
is feeded to a software platform with a
software runs because you know software
uh knows only binary one or zero right
we are not binary we always think with
quantitative measurements no binary
thing
so for a software is all binary one or
zero so based on the binary values we
can create an algorithm now the software
will follow the algorithm blindfoldedly
and we know that we are going to get our
data okay so we just just simply i give
you an example of an algorithm of
searching a book so similarly if you
have a query sequence and you need to
search that query sequence throughout
the all database under ncbi because ncbi
runs multiple databases we know that
okay as very popular as well the traffic
is also very good so your query sequence
you want to find match whether your
query is matching to any other query any
other sequence in the database okay
that's what we run blast for okay so
blast algorithm
has a step what are the step here
basically blast algorithm follows
the processing of query is a proper it
process the query with the proper stages
okay so for example uh what i can tell
you is that
this query sequence you in you put it
with a query sequence the input is your
query sequence
and your query sequence the very first
step is removing
removing
let me write removing
low complexity area
low complexity
area
so any low complexity area will be
removed for example
let's assume
that we are talking about a protein
sequence
and
let's say we're talking about a protein
sequence of uh
so low complexity area means those which
are containing repetitive sequence so
these are repetitive sequence this is
also repetitive sequence okay so
repetitive sequence are not usually
uh
allowed and the first thing that the
blast algorithm do is that it removes
all the repetitive sequence and instead
of repetitive sequence they place x
for what x and n x for protein
and uh n for nucleotide so it will be
four x
k r k d l uh d k sorry d k l l
and then four more x so x means you are
not going to consider those sequence
because there are low complexity we only
take sequence which are unique one to
two sequence is fine but no repetitive
sequence okay now after this thing is
done
after this thing is made the list of
words the list of words
uh for each of the word in the query is
scored and we can give the score for
each of it so i'm not going to talk that
i know the the hard and fast rules of
blast algorithm there i'm going to share
it in a way so that you understand
that's very important because you know
that blast algorithm background is not
very important what is important is how
to run a blast in practical so practical
knowledge for blast is more important
practical knowledge for faster is more
important we'll do that later on but
just try to understand the situation now
so we have value right for individual
for individual position we have a score
or scoring system
we have this scoring system okay and uh
the word so so whatever query we're
searching we call it word okay how many
words for a nucleotide sequence we use
we generally take three words
okay c for protein sequence 11 for
nucleotide
that is the fixed length for which we
check we run the sequence similarity
search okay
so now what will you do is that based on
each of this word matching so whenever
there is a match
a word there is a score given based on
that match if there is a mismatch then
there is a different score there's a
match there is a score so for 11
different uh 11 stretch of a word
uh there will be a particular score
given to that sequence okay
now
by this fashion
maximum score what will be the maximum
score ask yourself what will be the
maximum score
if
all 11 words are matched all
matched
then only
if all of them are matched then only
what we will get
then only will get maximum score and and
generally in this case the maximum score
value is known as threshold value
okay
so maximum score not threshold sorry
this is a maximum score when you get a
full match but full match is not a
possibility so sometimes you know out of
matches out of 11 10 matches out of out
of 11 5 matches there are different uh
possibilities possible we have a maximum
score if all of them matched
and we have
you