Bioinformatics blast tutorial | Bioinformatics course lecture

 

all right so i'll take a color and uh

i'm going to talk a little bit about

basic local alignment search tool now

blast and uh this is something whenever

you talk about bioinformatics blast is

something that we always

need to understand always need to

perform

practically as well so what is a full

form of blast blast t all caps

basic local alignment search tool okay

now basically it's a tool for searching

similarity which is developed by

ncbi okay ncbi developed blast is very

very popular tool to find out sequence

similarity okay so the sequence

similarity search tool you can say that

sequence

similarity

this is sequence similarity

search

tool

now how to search for sequence

similarity it can be either nucleotide

sequence or it can be so either

nucleotide sequence or amino acid that

means protein sequences and we can

search that sequence basically okay

we can search that sequence

so uh so in this case we have a sequence

so let's say the sequence that we use to

search throughout the all database under

ncbi that database is known as the query

query sequence

query sequence is the sequence

to be tested

tested means to be

searched okay to find for similarity and

then there is the whole database

whole database sequence database is

there to find the match

okay so whenever the match is there it's

like you know you are trying to get a

bioinformatics books for your

preparation i told you the name of the

book you have the book in your hand and

now you went to a library or a shop

where you want to get that same book so

what you will do you just show it to the

librarian and tell them to search the

whole library of books and to get that

same book out okay so what that

librarian will do the librarian will

follow follow a protocol right because

the library is filled with thousands of

books so the librarian will take your

book and first it will check what kind

of book it is whether it is of lower

study or higher study it will be a

higher study book because bioinformatics

is for higher studies so there is a

higher study section in the library so

now

the search is limited so from the whole

library now it's going to the higher

study now inside heart study what

subject it is bioinformatics so under

life sciences bioinformatics so

the person will now stick to the life

science portion where only the life

science books are there so now in the

like this will be even smaller than the

whole library then even smaller than the

total higher education book section and

now in the life science section

textbook section particularly the person

will search for all the different

subjects zoology botany physiology

bioinformatics biochemistry then the

person will find out uh you know a place

where the bioinformatics only

bioinformatics books are there and then

in that particular

let's say

uh storage unit of bioinformatics book

then the person will search for what

kind of book it is whether it's an

indian author book or a foreign author

book now let's say it's a foreign author

book then you go to the foreign author

section of the bioinformatics uh library

section and the foreign author book then

that person will search for that book

now the search becomes so organized when

we search it like this or now if you

randomly give this bioinformatics book

to someone who don't know about anything

about bioinfo haven't heard about it or

don't know anything about

it's not educated at all and you give

this book to the person the person will

now run and try to see all the books in

the library try to match the front cover

and you it will return your query so who

will take more time obviously the person

later will take more time and actually

it's not logical at all not scientific

at all so there should be a proper

approach this approach is known as

algorithm so when we talk about

algorithm algorithm is a process which

is feeded to a software platform with a

software runs because you know software

uh knows only binary one or zero right

we are not binary we always think with

quantitative measurements no binary

thing

so for a software is all binary one or

zero so based on the binary values we

can create an algorithm now the software

will follow the algorithm blindfoldedly

and we know that we are going to get our

data okay so we just just simply i give

you an example of an algorithm of

searching a book so similarly if you

have a query sequence and you need to

search that query sequence throughout

the all database under ncbi because ncbi

runs multiple databases we know that

okay as very popular as well the traffic

is also very good so your query sequence

you want to find match whether your

query is matching to any other query any

other sequence in the database okay

that's what we run blast for okay so

blast algorithm

has a step what are the step here

basically blast algorithm follows

the processing of query is a proper it

process the query with the proper stages

okay so for example uh what i can tell

you is that

this query sequence you in you put it

with a query sequence the input is your

query sequence

and your query sequence the very first

step is removing

removing

let me write removing

low complexity area

low complexity

area

so any low complexity area will be

removed for example

let's assume

that we are talking about a protein

sequence

and

let's say we're talking about a protein

sequence of uh

so low complexity area means those which

are containing repetitive sequence so

these are repetitive sequence this is

also repetitive sequence okay so

repetitive sequence are not usually

uh

allowed and the first thing that the

blast algorithm do is that it removes

all the repetitive sequence and instead

of repetitive sequence they place x

for what x and n x for protein

and uh n for nucleotide so it will be

four x

k r k d l uh d k sorry d k l l

and then four more x so x means you are

not going to consider those sequence

because there are low complexity we only

take sequence which are unique one to

two sequence is fine but no repetitive

sequence okay now after this thing is

done

after this thing is made the list of

words the list of words

uh for each of the word in the query is

scored and we can give the score for

each of it so i'm not going to talk that

i know the the hard and fast rules of

blast algorithm there i'm going to share

it in a way so that you understand

that's very important because you know

that blast algorithm background is not

very important what is important is how

to run a blast in practical so practical

knowledge for blast is more important

practical knowledge for faster is more

important we'll do that later on but

just try to understand the situation now

so we have value right for individual

for individual position we have a score

or scoring system

we have this scoring system okay and uh

the word so so whatever query we're

searching we call it word okay how many

words for a nucleotide sequence we use

we generally take three words

okay c for protein sequence 11 for

nucleotide

that is the fixed length for which we

check we run the sequence similarity

search okay

so now what will you do is that based on

each of this word matching so whenever

there is a match

a word there is a score given based on

that match if there is a mismatch then

there is a different score there's a

match there is a score so for 11

different uh 11 stretch of a word

uh there will be a particular score

given to that sequence okay

now

by this fashion

maximum score what will be the maximum

score ask yourself what will be the

maximum score

if

all 11 words are matched all

matched

then only

if all of them are matched then only

what we will get

then only will get maximum score and and

generally in this case the maximum score

value is known as threshold value

okay

so maximum score not threshold sorry

this is a maximum score when you get a

full match but full match is not a

possibility so sometimes you know out of

matches out of 11 10 matches out of out

of 11 5 matches there are different uh

possibilities possible we have a maximum

score if all of them matched

and we have

you

Post a Comment

Previous Post Next Post