How does Solidity work with the EVM?

Solidity compiles to bytecode for the Ethereum Virtual Machine. It's less abstract than JavaScript, and inefficiency can be costly due to blockchain storage and operation fees.

Solidity is a high-level language that compiles down into bytecode which is run directly on the Ethereum Virtual Machine. ⚙️ 1️⃣ 0️⃣ ⚙️

The terms high-level and low-level can be thought of as a spectrum. Programming languages like C are considered low-level. On the other hand, Solidity and JavaScript are considered high-level languages. We can say that JavaScript is more high-level than C. In general, the more details of the machine that are abstracted away from the developer, the more high-level it is considered.

Solidity is more low-level than JavaScript, which means it hides less of the underlying machine details. This can make it more difficult to work with, which is by design, to some degree, as we should be very careful about what we're deploying on the EVM! Every operation on the EVM and everything we store on the blockchain costs money, so it becomes expensive to be inefficient. Plus, we want to make sure we understand our Smart Contracts inside and out so we don't have any bugs! 🐞 😱

The lowest-level languages are what the machine actually executes. For the EVM this is bytecode, which is where the code for the operations, known as opcodes, are stored in a byte.

Talk Bytecode To Me 🗣

Solidity is the part we deal with as a developer. What does the EVM deal with? 🤔

Bytecode! Let's talk about bytecode and EVM opcodes a bit.

⚠️

This part goes over some low-level details of the EVM which we won't cover in too much detail. Some parts may be confusing. Stick with it though, it's a good idea to have a basic understanding of the low-level!

Consider a Smart Contract containing a simple while loop:

uint i = 0;
uint sum = 0;
while(i < 5) {
    sum += i;
}

📘

Here uint is a declaration of an unsigned integer. We'll discuss the intricacies of this data type in future lessons. For now consider it a declaration of a variable that stores a number.

When we send a Smart Contract to the EVM we send only the bytecode:

👨‍💻 👩‍💻 ⚙️ 1️⃣0️⃣1️⃣0️⃣ 👉🏻 🌐

For a contract containing just the loop code above, the resulting bytecode can be quite long:

6080604052348015600f57600080fd5b5060a58061001e6000396000f3fe6080604052348015600f57600080fd5b506004361060285760003560e01c8063a92100cb14602d575b600080fd5b60336049565b6040518082815260200191505060405180910390f35b6000806000905060008090505b600582101560675781810190506056565b80925050509056fea264697066735822122058d7e11ff1d36fc53779562e305af3c9180b2ab8dccfe6d234fa50420908a5d864736f6c63430006030033

Well, that's quite scary! What is all that? 😱

Some of it is opcodes and some of it is operands, which are the optional arguments. An opcode and its operands combined form an instruction.

We can lookup EVM Operation Codes and replace the opcode with its name (commonly referred to as a mnemonic):

PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH1 0xF JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0xA5 DUP1 PUSH2 0x1E PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN INVALID PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH1 0xF JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x4 CALLDATASIZE LT PUSH1 0x28 JUMPI PUSH1 0x0 CALLDATALOAD PUSH1 0xE0 SHR DUP1 PUSH4 0xA92100CB EQ PUSH1 0x2D JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x33 PUSH1 0x49 JUMP JUMPDEST PUSH1 0x40 MLOAD DUP1 DUP3 DUP2 MSTORE PUSH1 0x20 ADD SWAP2 POP POP PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 RETURN JUMPDEST PUSH1 0x0 DUP1 PUSH1 0x0 SWAP1 POP PUSH1 0x0 DUP1 SWAP1 POP JUMPDEST PUSH1 0x5 DUP3 LT ISZERO PUSH1 0x67 JUMPI DUP2 DUP2 ADD SWAP1 POP PUSH1 0x56 JUMP JUMPDEST DUP1 SWAP3 POP POP POP SWAP1 JUMP INVALID LOG2 PUSH5 0x6970667358 0x22 SLT KECCAK256 PC 0xD7 0xE1 0x1F CALL 0xD3 PUSH16 0xC53779562E305AF3C9180B2AB8DCCFE6 0xD2 CALLVALUE STATICCALL POP TIMESTAMP MULMOD ADDMOD 0xA5 0xD8 PUSH5 0x736F6C6343 STOP MOD SUB STOP CALLER

Even that doesn't really clear things up! 🙁

Fortunately, the Solidity compiler has an assembly output that can help us take a closer look! This output will line up the assembly code with the solidity code that it was generated from so we can study it.

🙈

We won't study it all, however! Quite a lot of the code generated handles default behavior for Smart Contracts. We'll cover that behavior in upcoming lessons.

For now, let's take a look at one particular section:

tag 7               while(i < 5) ...
  JUMPDEST          while(i < 5) ...
  PUSH 5            5
  DUP3              i
  LT                i < 5
  ISZERO            while(i < 5) ...
  PUSH [tag] 8      while(i < 5) ...
  JUMPI             while(i < 5) ...
  DUP2              i
  DUP2              sum += i
  ADD               sum += i
  SWAP1             sum += i
  POP               sum += i
  PUSH [tag] 7      while(i < 5) ...
  JUMP              while(i < 5) ...
tag 8               while(i < 5) ...
  JUMPDEST          while(i < 5) ...
  DUP1              sum
  SWAP3             return sum
  POP               return sum

👈🏻 On the left side you can see a representation of the EVM instructions.

👉🏻 On the right side you can see a comment indicating where in the Solidity code the instruction is being generated from.

Also included here, for human readability, are tags which are more commonly referred to as labels. Labels are used to mark locations in the code that we can go to. This is necessary because, at the lowest level of computer instruction, there are no loops. 🤯

Instead of loops, there are operations that allow you to change what line of code runs next. These operations manipulate the program counter which is the variable that stores the line of code being executed.

The two operations that manipulate the programming counter in the EVM are JUMP and JUMPI. The tags are locations in the code that you can jump to. That's why underneath the declaration of both tag 7 and tag 8 in the code above you'll see there's a JUMPDEST operation. This operation creates a valid destination for a jump.

📘

Fun fact, JUMP and JUMPI are the instructions that make the EVM Turing Complete! With these operations we can trivially create an infinite loop, which is why it's necessary for Ethereum to apply a gas cost to every operation. Don't want the network nodes to be stuck in interminable loops! 🌀

So if there are no loops in assembly code, how does a while loop get compiled from Solidity to bytecode? 🤔

Let's take a look at this particular section of code:

  PUSH 5            5
  DUP3              i
  LT                i < 5
  ISZERO            while(i < 5) ...
  PUSH [tag] 8      while(i < 5) ...
  JUMPI             while(i < 5) ...

👆🏼 There's quite a bit going on here!

The value 5 is being pushed onto a memory structure called the stack.

📘

The stack is where most operations on the EVM will load/store runtime data. It is a data structure that is used for temporary memory like storing local variables and running arithmetic on those variables. Many opcodes will use the values at the top of the stack as its operands. And just like the stack we programmed in data structures it's a LIFO structure with quite a bit of pushing and popping!

The value for the variable i is stored in the third-byte position on the stack, so that is duplicated and compared to the value 5 using the LT (less than) operator. If the value is less than 5 this will evaluate to 1, otherwise 0.

The ISZERO here is the condition for which JUMPI will jump to the memory location tag 8. If it results in 1, it will jump to tag 8 and will move towards the part of the program that will return the sum.

If the comparison results in 0, we'll run the loop code:

  DUP2              i
  DUP2              sum += i
  ADD               sum += i
  SWAP1             sum += i
  POP               sum += i
  PUSH [tag] 7      while(i < 5) ...
  JUMP              while(i < 5) ...

👆🏼 Here it will add the value stored in i to the sum value. It does this through some stack manipulation and the ADD operation which, as you might imagine, adds two numbers together. After this, the program will unconditionally jump back to tag 7 from where it will run the i comparison again.

📘

Quite a bit of this is simplified to give the gist of what is happening. If you have an interest in digging further, I'd suggest taking a look at an EVM Disassembler.

All of that is pretty complicated! Aren't you glad we have high-level languages? 😅

🏁 Wrap Up

We discussed the Solidity Language in general and quickly studied the bytecode it generates when it is compiled. ⚙️

In the next few lessons, we'll start digging into the Solidity language itself and how to write awesome Smart Contracts! 🚀

Learn More About Solidity

Alchemy University offers free web3 development bootcamps that explain Solidity in-depth and help developers master the fundamentals of web3 technology. Sign up for free, and start building today!


ReadMe