As I written this post for another target and it can be also useful for anyone, even if It is one more post on many of them, here are a couple of words about Meltdown and Spectre security issues. Read mode for details.
As you may noticed and read on press the new year 2018 is arriving with two new digital storms : Spectre and Meltdown. As many things are written on these security issues for the best and the worst as usual, I’ll try to give here some details of them and the associated risks, trying to not repeat what you’ve read hundreds time in the mass media.
Spectre and Meltdown are related to CPU bugs. You need to understand that a CPU is actually something really complex with the equivalent of billion of line of code if we compare to software (as since decade we code hardware the comparison is quite good). The number of transistor is the equivalent of number of line of code in software and we have actually about 10.000.000.000 in each of our computers (and also smartphone)
To understand why this is growing these is a reason related to the storage and I can quote a tweet read today :
My new iMac Pro, with 64 GB RAM and 4TB SSD has 11 times as many bytes of electrnic memory as the Apple II, and by "Apple II", I mean the total of all electronic memoy ever installed in all 6 million Apple II computers ever made.
But this is not the only reason and the others are related to hardware complexity :
- The optimization of the code line (instruction) execution
- The management of memory allowing to run multiple application in parallel
- The management of the virtualization allowing to run multiple systems in parallel
- The addition of dedicated functions like for gaming or cryptography.
In fact these different optimization and management part are actually taking more transistor than the instruction execution itself. And they are this most complex part of the processors.
So the two security issues are coming from these peace of hardware. We can talk about bug but basically has much I understand, it is not exactly a bug, but they are security issues related to a side effect of these optimization and dedicated code. For this reason as they are side effect of common way to manage certain optimizations, they are not only touching one family of processors but they are impacting Intel as AMD and ARM (ARM are less known but basically part of all our smartphone, cars…)
Meltdown and Spectre are two security issues related to different problem around the memory access security. To understand this we need to understand how works a modern CPU and how the memory security is managed.
Basically, a computer have a large memory where any data is stored. It is true for different application data running the same system but also for different systems running on the same hardware (virtual machine). They are all sharing the same chip of memory to store the user and system data. To ensure an application will not be able to read the data of another application or a system able to read other system, there is a Memory Management Unit (MMU) in the processor virtualizing the memory and managing the access to this memory. Each zone is owned by an application and normally only this one can access to this zone. Any try for unauthorized access will crash the process attempting this access. The famous Segment Fault developers would have certainly seen.
There is one certain type of program that need to access the whole memory (or at least the sum of memory of all its sub program) : the operating system. For this reason in the CPU we have different level of execution (also named ring), depends on the level of execution you can access larger part of memory. There is an isolation between application level (user mode) and system level (kernel mode) managed at the CPU level.
Historically speaking, all these mechanism where managed at the kernel (software) level and it was costing a lot in term of CPU performance. The last decade the processor have implemented all these mechanism into the hardware to be able to support the virtualization in a efficient way. Thanks to these evolution it has been possible to create Cloud Hosting where a same hardware can be shared by tens of different customers, transparently, ignoring each other.
Meltdown is a security issue related to the boundary between user mode and kernel mode. It means Meltdown let a standard application able to read memory like the kernel : accessing the whole memory of any of the running process. This means if an attacker can run a software on your machine, he can dump the whole memory and this includes information stored in other application like passwords from your browser. In fact this is not something new as any security issue allowing to switch into kernel mode (many has existed and will continue to exist) will do the same. What is new is that this is at Hardware level and the definitive patch is not easy to make and deploy (in fact you need to change your hardware). Software work-around exists but they will impact the performance.
- On Desktop : as any privilege escalation issue / work around is slowing for 5 to 30% to computer. Basically we can survive.
- On Mobile : privilege escalation issue a rare so this is impacting more than ever and more over we are installing a lot of free and malicious-friendly application, not controlled, we also have more critical / personal oriented application as for bank on our cellphone. So I would consider this issue as Major on mobile. The advantage is cellphone programming is at higher level than on Desktop and hardware differ on every cellphone. So making a solution working on any will be a long work.
- On Company Server : basically I would say that as we control what is running on the server, the risk is limited. The main risk is someone having user access on the server, able to run custom code can extract data from other process. Basically we can survive depends how we are controlling user access. Impact of the work around are slowness and potential need to buy new hardware to compensate the slowness.
- On Cloud Server : It depends on the way the virtualization is made. Typically with Containers (like Docker…) the risk is really high has the same kernel is sharing between different tenant. So customer data are exposed to other unknown and random customers. For virtualization like KVM or VmWare as the systems are isolated and running different kernels I assume we have a lower risk as much as the attacker does not execute user code on the hypervisor itself (normally no-one should have access other than a couple of admins). By-the-way, you need to patch because you have to as a public service. Basically the risk is high. As a consequence the performance will decrease and you may have to accelerate your refresh cycle because client will prefer safer CPU with no performance impact for the same price. So the impact for you is big, even if in a first time you can expect to sell more VM to compensate.
The problem :
The problem is related to the out-of-order execution system included in the processor. Processor are using pipeline to split an instruction in smaller part it can quickly execute. This allows to make processor working at really high frequency because the frequency of a processor is related to the time needed for the longer execution part to be executed. As a consequence each small part have to be longer as the others as much as possible and as small as possible to reach really high frequencies. It means multiple parts of multiple instructions are executed in parallel. The problem is sequential instruction are generally referencing each other like in the following program:
a = 2; b = a + 5; c = b + 3; e = a + 3; f = e + 3; d = 5;
You can see that most of the lines depends on another line previously executed so basically you need to wait the end of the previous one to execute the next one. This is totally breaking the pipeline execution and slowing down the CPU performance. To manage this the program can be rewritten differently :
a = 2; d = 5; b = a + 5; e = a + 3; c = b + 3; f = e + 3;
We can execute these line 2 by 2 with no conflict and the result stays the same. This is out-of-order execution. This is what modern cpu do in real time : it reorganize the code to fill the pipeline with compatible instruction. So now imagine the following program :
do_something in user mode_1 do_something in user mode_2 switch to kernel mode do_something in kernel mode_1 do_something in kernel mode_2
The cpu can re-order this into
do_something in user mode_1 do_something in user mode_2 do_something in kernel mode_1 do_something in kernel mode_2 switch to kernel mode
just because it is more efficient… In this case the CPU assumes the switch to kernel mode will be a success and will happen. For sure after committing any result of the kernel mode instruction it will ensure it happens and it has been a success. If it is not the case it will rollback all what has been done. This is was Meltdown use : it creates a peace of code that is normally never executed. This peace of code will access the unauthorized memory zone. Due to out-of-order execution, even if it is never reach the peace of code will be executed will the kernel rights. But no error or exception related to this access will be fired as this peace of code should have never been executed. As a consequence you have made an access to any memory zone with kernel right.
At this point, the problem is : you have accessed to the memory zone but you do not have captured this data as everything has been rolled back. The last part of this post will later explain you how to extract the data itself thank to the caching mechanism.
This is possible also because in Linux and most of systems to simplify and accelerate the system calls (kernel access) the kernel addresses are mapped the same way on every application virtual memory and protected by the ring 0 instruction access mode (switch to kernel mode). The solution deployed to fix the security issue is to have dedicated memory pages for the kernel instead of sharing the same as the user process. This is Kernel Page Table Isolation (KPTI). The performance is slower, as a consequence, because the move from page to another takes longer time for the hardware than just a ring check. So only kernel calls are impacted. It means for pure computation application (3d, video…) the performance will not really be impacted but for I/O intensive application (database, web-server…) the performance will be dropped.
Spectre is a security issue related to the access of memory zone not authorized inside a given process. If the malicious code is inserted into a shared element like a library (DLL) it can access multiple processes. Basically this is particularly impacting browsers or office software : you have one process with multiple instance of data (browsing tabs or opened document) Normally all the instances are isolated and if you run something in a tab of your browser it can’t access to the data of the browser (like password or history) neither the content of another tab. This is possible thanks to memory isolation at processor level, the processor is controlling the memory access from each of the threads and authorize it or not. Spectre has found a way, not to force this but to workaround the security elements to make these read possible.
- On Desktop : browser are the typical target for a such attack as it allows to leaks data from other tabs, history, password. Office is also a good target as it allows to leaks confidential data using activated scripts. The Risk is really high but the patch can be applied at application level (browser) and operating system. Basically most of the Browser have already implemented workaround and they are not impacting the performance a lot.
- On Mobile : each of the application are isolated from each other, I would consider the risk as limited but existing. For sure the browser are impacted the same way as on Desktop but due to the way the bug work i’m not sure we already have proof of concept as on Desktop.
- On company server : the risk exist but is limited due to the type of workload running on these machine : no custom code (like script) should be executed by the server. For sure compromised library could allow large data leak but such attack could be performed differently without this specific bug and have the same kind of result. So in my opinion the problem is less impacting.
- On Cloud server : the risk is in the most cases the same as on company server as the virtual machine isolation is not impacted by this problem. But for web hosting where different clients have different website served by a global multi-tenant webserver we could imagine one of them executing scripts like PHP and leaking data from other site. This could give access to login/password and token allowing unwanted connection from the attacker.
Explanation of the problem:
As I written previously, there is no bug, just some clever people finding a way to use the way the processor work to leak data. In Spectre the attack is based on predictive branch execution. This is why I started talking on code execution optimization. To be short in modern processor the instructions are complex and split into a lot of sub-instruction (this is a pipeline), multiple are executed in parallel (this is a super-scallar architecture). Because some instruction are dependent on the result of the previous one the processor needs to wait so we use instruction coming from different flow of execution (aka different process) (hyperthreading) and we mix all of this. The objective is to keep the pipeline full of things to execute otherwise you will pass your time waiting for conflict resolution and the performance will be bad. One case is complex to manage : the branch because the CPU don’t know what will be the instruction to execute after the branch (if or loop). To help this it reorder the instructions on the fly to know the response before having the pipeline empty (one solution) and it also make a choice before knowing the reality of the choice.
It’s on this specific point that Spectre base it’s attack : when the processor is making a branch choice it start executing instructions and in case the choice has been wrong it rollback all what it has done. But during this period of time as the result of the instructions are not committed (because they are still in execution and not finished) no security check are made on the memory access. So during this time it can request to read a certain zone of data where you should not have access. For sure if the branch is well chosen and the instruction completely executed it should result a memory access violation and a thread stop. But when you correctly write your code, this branch will never be reached and the program will continue to run. That said, no data has been extracted at this point because the instruction reading the data has not been committed. But the data has been loaded in the register and we will see right after how this can be used.
Meltdown and Spectre data extraction
As we have seen, Spectre and Meltdown allow to read a non authorized memory zone and load this data into a register just before removing it and clean all what have to done. So the question is how can we extract the data from this point ?
The solution is quite simple and remember me some peace of code I’ve written 20 years ago to test some processors sold with less cache than they promised. Because all is about memory caching.
Basically memory caching exists in all modern processor because the main memory is something really slow compared to the processor. It is something like 100 times slower. So to avoid long wait en every memory access, when a data is read, a large block of data is transfer in background from the main memory to the cache. The cache is a small memory working really fast, really near the processor speed to make it simple. So the first time you access a zone not in cache you need to wait a long time before getting a responses (about 100 to 400 cycles = 50-200ns at 3GHz) when the second time you access the same zone you immediately have the answer (about 0,3 ns).
So once you have been able to read a data from a non authorized memory zone, to extract the data you can read a page of memory you own, in relation with the read value. By doing this the page will be store in cache and the page will stay in cache. So right after this read you will be able to read again the 256 possible pages of memory accessed by your malicious code. On these 256, only one will quickly respond to your request : the one that has been preloaded by the malicious code. So at this point you know the value of the memory byte you have targeted.
Then you can do this for each of the memory bytes, one by one and leak the information. The process is long and the extraction is in Kb / s ; so extracting the whole memory will take a long time but is still possible. One way to prevent this problem is to limit the capability to measure execution time by reducing the precision. This is what have been implemented in browsers as a first workaround.
Important things to notice :
- As you can see there is no direct access to the memory and the byte are extracted one by one by deduction. It is taking time to extract the whole memory. But is you target a specific known memory zone it can be really fast.
- This attack only allows to read the memory, not writing it. So the impact is more about data leaks than data destruction or remote execution. But it can be part of a larger attack mixing different issues.
- Not all the processors are impacted : the one not including the optimization (out-of-order execution and branch-prediction) are not. This is the case of raspberry pi as well explained in this post.