🐸 pepe:curious

dissecting linux schedulers & implementing our toy cfs_scheduler simulation

Abstract

Linux scheduling involves multiple things like which tasks to pick, how to pick, how to maximize the CPU utilisation etc. Over the time scheduling algorithms evolved from FIFO, RR to EDF and CFS. These algorithms performs better in different type of tasks i.e. real time, hard limit tasks. An algorithm that is better in scheduling real time tasks is not necessary that it will be able to schedule a hard limit task. This article is the study of what Scheduling heuristics are there to take in mind before checking any scheduling algorithm, how different schedulers evolved and how they are being in use. At the end we will run some simulations with different kind of tasks in our toy cfs schedular simulation.


TL;DR


1.1 Task scheduling

Whenever we have a set of tasks τ and to check if this set is schedulable, we need to check if it exceeds CPU utilisation or not.
τ=(Ci,Ti,Di) where
Ci is the cpu time that τi will take to complete its exection
Ti is the time interval after which τi's second instance will be released
Di is the deadline before which that τi should complete its execution
Note:- Di=Ti is a common assumption for real time systems


1.1.1 For Uniprocessor:-

For uniprocessors or single processors there should exsist a function σ(t) which maps t to a set of tasks containing runnable and idle tasks
σ(t):tT(Tidle)


1.1.2 For Multiprocessor:-

Let say we have multiprocessors setup with M CPUs. For multiprocessor σ(t) should map a matrix of tasks to M CPUs to time t as:-
σ(t) : T(T(Tidle)M

Note:- Hence, σ(t) is a scheduling function in the context of CPU scheduling which determines which task is assigned to execution at any given time t

In dual core processor σ(t) for at time t will look like:-
σ(t):=[T1T2],M=2

For M>1 there is two types of scheduling that we will look at:- Global scheduling, Partitioned Scheduling


1.2 Schedulable Task Set

For N periodic tasks, the task set is schedulable if
(CiTi)1
which essentially means that summing up all the utilisation of CPU per task Ti is still not maxium which is 100% of CPU capacity.

similarly, for multiprocessor this equation becomes
(CiTi)M

which essentially means that summing up all the capacity used by the task set T it should not exceed the total CPU capacity available.


1.2.1 Dhall's Effect

Even if we have M then also it is possible that task Ti is not schedulable because of its high utilisation behaviour that is also called hard limit task.

Let's take an example of (M+1) tasks where M:=(ϵ,T1,T1) and (M+1)th:=(T,T,T) then CPU utilisation upper bound can be found as:-
CiTi:=MϵT1+TT
:=MϵT1+1
Now, as ϵ0 which is possible for real time tasks then CiTi1

In this case EDF (earliest deadline first) and RM (rate monotonic) scheduling algorithm can not be used because they are not meant for hard tasks.

Hence, even if utilisation is less than what we want but it doesn't guarantee if task set is schedulable. Therefore, algorithms like EDF and RM are applicable only for soft tasks.


1.3 Constant Bandwidth Server (CBS)

Constant bandwidth server is used for hard limit tasks where it ensures that tasks meet their deadlines by allocating a constant bandwidth allocation to them.

It allocates capacity that a task can use and periodic interval after which task's instance can run. T:=(Qi,Pi). In other words task τi is allocated a maximum runtime of Qi after every periodic interval of Pi

If task τi wants to run for more than allcated Qi every Pi then algorithm throttles/decreases its priority.

jQjPj1 where τi is guranteed to receive Qi every Pi interval.

Each task's dynamic deadline is maintained and throttled according to the history and allocated bandwidth. For M cpus/cores with global scheduling if jQjPjM then CBS guarantees the scheduling of tasks from a task set.


1.3.1 CBS Algorithm - 1

Each task τi starts with a dynamic deadline dis that are initialised as 0 when task is created.
When task wakes up, check if current deadline can be used:-
dis>t and qidist<QiPi
If not, dis=t+Pi and qi=Qi

This ensures that when a task wakes up, it will get its share of cpu after the alloted wait time. But if task wants to run for more time than allocated then its priority will be throttled.

CBS algorithm is generally used in RTOS (real time OS) or real time tasks where they have to meet the deadline or the deadline is critical.


1.4 Complete Fair Schedular (CFS)

CFS is an algorithm which can be used in general to schedule the tasks instead of CBS which is favorable for real time tasks. CFS relies on the hypothesis that it can give a fair share to each task in a task set.
CFS uses few terms/parameters as stated below:-


1.4.1 Our toy CFS simulation

We made a simulation where our cfs picks up the tasks from a runnable queue (priority min queue according to vruntime). At t=0 every task has some vruntime, some random values are given instead of zeros, and vruntime increases like:-

CFS simluation
GLOABAL NICE_0_LOAD = 1024

FUNCTION weight_function(priority):
    return NICE_0_LOAD / (priorit + 1)
    
FUNCTION run_io_bound(process, iowait, queue):
    sleep_thread(iowait)
    weight = weight_function(process.priority)
    process.vruntime += (iowait * NICE_0_LOAD) / weight
    executed_time = 1
    process.cpu_burst_time -= executed_time
    process.vruntime += (executed_time * NICE_0_LOAD) / weight
    if process.cpu_burst_time > 0:
        queue.push(process)
    
FUNCTION run_cpu_bound(process, timeslice, queue):
    executed_time = min(timeslice, process.cpu_burst_time)
    process.cpu_burst_time -= executed_time
    weight = weight_function(process.priority)
    process.vruntime += (executed_time * NICE_0_LOAD) / weight
    
    if process.cpu_burst_time > 0:
        queue.push(process)

LOOP task in queue:
    if task is cpu_bound:
        run_cpu_task(task, 1, queue)
    else
        run_io_task(task, 10, queue)

1.4.2 Some simulations:-

First simulation is of processes with different vruntime, priority etc but all processes are of IO_BOUND nature whereas second one contains all CPU_BOUND processes.

We can clearly see that IO tasks are waiting for their iowait time and getting penalised. More priority IO task is getting more penalised and scheduling will get delayed whereas in CPU bound tasks, Process 1 got finalised quickly because it has the highest priroirty and got penalised the least because it started with least vruntime and highest priority. So, resulted in more frequent scheduling. Screenshot 1 Screenshot 2


References

https://github.com/singhdevhub-lovepreet/cfs-schedular-cpp https://retis.sssup.it/luca/AKP/Slides/dl_internals.pdf https://en.wikipedia.org/wiki/Completely_Fair_Scheduler https://wxdublin.gitbooks.io/deep-into-linux-and-beyond/content/cfs_internals.html