Genetic Algorithm-Everything You Need To Know Manas Sinha
Developer | Designer

Genetic Algorithm-Everything You Need To Know

You must have heard about Darwin’s theory of evolution and natural selection. Genetic Algorithms are basically an effort to imitate the same, artificially. In this article, we will learn just about the algorithm and we will code a GA from scratch in the next article.

0. Genetic Algorithm consists of 5 steps mainly

To understand the steps lets assume we are trying to generate the string “the binary realm” using a genetic algorithm. Now, it might sound weird to generate a string using GA, makes no sense. But it will help understand the algo better. Consider a world with random strings as its population(also known as chromosomes), strings are of length 16 len("the binary realm") and may consist only lowercase English alphabets, spaces, and characters like "*","/",".",":" . Individuals need to have some restrictions after all we humans also do.  1.Initialization of Population

Genetic Algorithm is a randomized search algorithm. A randomized search algorithm is an algorithm that incorporates some kind of randomness or probability in its methodology. Here in GA, a random process is used to create an initial population pool. A Population Pool is a collection of individuals of the current generation. The POPULATION_SIZE is a parameter. The larger the population the larger the chances of the algorithm to converge.

int POPULATION_SIZE = 100000;
vector<int> population_pool;
for(int i = 0;i < POPULATION_SIZE; ++i){
population_pool.push_back(create_individual());
}

2.Evaluation

After the population is created we evaluate the fitness value of each individual. Fitness function is a function that determines how good the individual is. In this case, the string which is closest to the super string will have the highest fitness value. The fitness value can be calculated using the number of characters in an individual A that is the same as the corresponding character in the super string.

int fitness_value = 0;
for(int i=0;i<individual.length();++i){
if(individual[i] == super_string[i]) fitness_value++;
}

3.Selection

After the evaluation is done, out of the population pool we select a percentage (say x)of population that has the highest fitness value.The rest are discarded.

4.Crossover

The selected fittest individuals are then set to mate with each other to create the rest (100-x)% of the population. There are many ways in which this mating is done. Just like when humans mate there 50–50 chance that a gene is coming from the father or the mother. To read more about the various ways of crossover read the article here.

5.Mutation

Since the process is random there is a possibility that the algorithm never reaches the super string. For example, let’s say that among all the individuals created in the initialization process none of them has the character “b”. After all, the process is random so this is possible however low the probability be. So if we follow the normal evaluation, selection, crossover steps, the algorithm will never converge. The mutation is done to bring variation in the offsprings created in the crossover process. The probability_of_mutation is a parameter that determines that a particular gene (here genes are just the character of the string) of an offspring will mutate or not. It is often low since too much mutation is also not good. We select a gene and simply change it to another gene selected randomly.

char mutated_genes() {
int len = POSSIBLE_GENES.size();
int r = random_num(0, len-1);
return POSSIBLE_GENES[r];
}
int probability_of_mutation = 10;//10%
for(char c : offspring){
int p = rand()%100+1;
if(p <= probabily_of_mutation) c = mutated_genes();
}

6.Termination

The population pool is replaced by the new population created and the process is repeated until we get an individual with a fitness value of 16 (max-possible).

7.Stop

In this example, the algorithm stops when the super string is generated. But the algorithm might not converge 100% for all the cases rather it will reach the most optimal solution possible. For example, is we are using GA to train the machine to play a game, the machine will learn to play the game but it will not have 100% accuracy.

Terminology

1.Population pool :A Population Pool is a collection of individuals of current generation.

2.Chromosome : Individuals of current generation are also called chromosome.

3.Genes : Each chromosome is made up of genes.

4.Fitness value : A number/value that determines how good an individual is.

5.Probability of mutationA parameter that determines that a particular gene of an offspring will mutate or not.

6.Convergence: Convergence just means the algorithm reaches the most optimal solution and may or may not have 100% accuracy.