JavaScript Internals: A Beautiful Mess
To understand a program you must become both the machine and the program. -Alan Perlis
Introduction
A browser, such as Google Chrome, works behind the scenes by using a combination of compilers and interpreters to parse and execute JavaScript code seen on web pages. Google’s comic gives a fantastic visual picture of this. When you view a webpage, the browser does more than simply show the text and graphics; it also executes the underlying code to make everything interactive. Here’s when compilers and interpreters come in handy.
For example, an interpreter within the browser reads and executes JavaScript without first converting it into machine code. This enables quick execution and the dynamic behavior of current web pages. To improve performance, current browsers like as Chrome use just-in-time (JIT) compilers, which turn frequently used code into highly optimized machine code, resulting in faster execution. This combination of interpretation for flexibility and compilation for performance allows current browsers to execute web apps in an efficient and powerful manner.
JavaScript Execution Context
Event Loop
- The event loop continuously monitors the Call Stack and the Callback Queue. If the call stack is empty, the event loop executes the first task in the callback queue.
- Workflow:
- JavaScript executes synchronous code and adds functions to the call stack.
- The browser’s Web APIs support asynchronous functions (such as setTimeout and fetch).
- Once the asynchronous task is completed, the relevant callback is added to the callback queue.
- The event loop checks whether the call stack is empty. If it is, the next job from the callback queue is moved to the call stack and executed.
- If any microtasks are available, they are executed before the next callback in the queue.
Call Stack
- JavaScript is a single threaded programming language, which means it has a single Call Stack. Therefore, it can do one thing at a time.
- The Call Stack is a data structure that tracks where we are in the program.
- If we enter a function, we move it to the top of the stack. When we return from a function, we pop off the top of the stack. That is all the stack can do. Each element in the Call Stack is called a Stack Frame.
- Running code on a single thread can be simple since you don’t have to deal with the complex problems that arise in multi-threaded settings, such as deadlocks.
Web APIs
- JavaScript in the browser has access to a variety of Web APIs, including setTimeout(), XMLHttpRequest, and DOM event listeners. These APIs enable JavaScript to offload operations that might otherwise stall the main thread, such as making network queries or waiting for a timeout.
Callback Queue (Task Queue)
- When an asynchronous job (such as an API response or a timeout) is ready to be handled, the related callback is added to the Callback Queue.
Microtasks
- These are tasks that have a greater priority than other tasks in the queue. Callbacks for MutationObserver and Promises are two examples. Microtasks are completed before the next task in the callback queue.
Readmore about Event Loop
- Jake Archibald - In The Loop
- Philip Roberts: What is an event loop?
- The Node.js Event Loop
- Using microtasks in JavaScript with queueMicrotask()
- Further Adventures of the Event Loop - Erin Zimmer
Memory Management
Memory Layout
V8’s heap is divided into multiple spaces.
- Read-only space
- New space
- Old space
- Code space
- Map space
- Large object space
- Code large object space
- New large object space
New space:
- Most new objects end up here.
- Memory given out linearly.
- Divided into two regions.
Old space:
- Objects that survived two GC runs.
- Less likely to be moved around.
Garbage Collection
The memory heap is the location where objects and data are dynamically allocated during execution. Garbage collection techniques are used in JavaScript engines to discover and reclaim memory used by no longer required objects. The garbage collector also runs every time a JavaScript file is executed, so it has probably been run more than any other component in v8. On top of that, it introduces a great deal of uncertainty in the heap layout. Every time the garbage collector runs, it moves objects around and changes the heap layout completely, and it’s difficult to predict when it is going to run. There’s also concurrent garbage collection, which adds even more uncertainty to the mix.
V8 Heap is split into different regions called generations garbage collection :
- Young generation using semi-spaces and Copying Collection with parallel Scavenger (marks, copies and updates pointers at the same time)
- Old generation using Mark-Sweep/Mark-Sweep-Compact with incremental marking (tri-color marking and write barriers)
Readmore about GC:
- Garbage Collection: V8’s Orinoco
- Trash talk: the Orinoco garbage collector
- High-Performance Garbage Collection For C++
- A tour of V8: Garbage Collection
- Concurrent marking in V8
- Getting garbage collection for free
- Jank Busters Part One
- Jank Busters Part Two: Orinoco
JavaScript Primitives and Objects
JavaScript Primitives
Symbol
- Unique and immutable, often used as object property keys to avoid property name collisions.
String
- A sequence of characters used to represent and manipulate text.
Boolean
- Represents logical true/false values, crucial for control flow.
Number
- Int: Integer values.
- BigInt: For working with arbitrarily large integers beyond the safe limit of
Number
. - Double: Represents floating-point numbers, which are more common in JavaScript.
Undefined
- A variable that has been declared but not yet assigned a value.
Null
- Represents the intentional absence of any object value.
JavaScript Native Objects
Prototype-based Object Model
- Class
- JavaScript supports classes built on prototypes, allowing object-oriented design patterns like inheritance.
- Inheritance
this.__proto__
: Points to the object’s prototype, showing inheritance.super
: Calls functions on an object’s parent.
this
in Arrow Functions- Arrow functions don’t have their own
this
context, so they bindthis
lexically.
- Arrow functions don’t have their own
Object Properties Access
- Attributes:
- value: Holds the data.
- writeable: Defines if the property can be modified.
- enumerable: Determines if the property shows up during iteration, like
Object.keys()
or thein
operator. - configurable: Controls if the property can be deleted or changed, such as defining a setter or getter.
- Computed Properties:
- Setter/Getter: Enable functions to be called when getting or setting property values, making them dynamic.
Object Representation
Map
- The Map in V8 represents an internal pointer to the object’s HiddenClass, which contains metadata describing the structure (or “shape”) of the object. This shape provides a blueprint for the engine to understand where properties are stored, the object’s prototype, and other relevant characteristics.
- HiddenClasses work like a blueprint for object structure, allowing properties to be stored at fixed offsets in memory. This makes property access highly efficient because V8 can simply compute the offset instead of performing a costly lookup.
- New hidden classes are only created when named properties are added or deleted. Adding an array-indexed property (e.g.,
arr[0]
) does not trigger a new hidden class because arrays are handled differently in V8. - Contains
- The type of the object, whether it’s a regular object, array, or function.
- The size of the object in memory, critical for understanding memory allocation.
- Type of Array Elements: For arrays, the type of elements (e.g., integers, floats, or objects) is stored.
- The object’s prototype, which is part of the prototype chain lookup for inherited properties and methods.
- Instance Descriptor: Points to a DescriptorArray that holds information about the properties of the object.
- Back_Pointer: Helps V8 keep track of previous states when transitioning between hidden classes.
Properties
- A pointer to an object containing named properties.
- Fast Properties: When the object has a small, fixed set of properties, V8 uses a fast path to store properties in a linear structure. Each property can be accessed with a fixed offset based on the object’s hidden class (map).
- Slow Properties: When objects undergo significant changes, such as properties being added and removed frequently, V8 switches to a slow properties path, using a dictionary structure to manage properties.
- Shape transitions only occur for fast properties. If an object moves to slow properties, it no longer shares its hidden class structure with other objects. Instead, it has a unique dictionary representation that allows for arbitrary property changes.
Element
- Represent the array-like properties (numbered properties) of an object. JavaScript arrays and objects can have properties with numeric keys, and V8 handles these differently from named properties.
- Arrays and objects can have numeric properties, which are stored in a dedicated element store in V8.
- A pointer to an object containing numbered properties.
- V8 makes a clear distinction between different types of elements allows the engine to optimize access patterns:
- SMI_ELEMENTS: Small integers.
- DOUBLE_ELEMENTS: Floating-point numbers.
- ELEMENTS: Generic objects
- V8 optimizes operations on packed arrays (arrays without holes). Arrays with contiguous elements are much faster to access than those with missing values (holey arrays). Operations on packed arrays are more efficient because the engine can access elements sequentially in memory without checks.
In-Object Properties
- Pointers to named properties that were defined at object initialization. They are stored directly in the object itself, rather than in an external property storage structure.
- If an object’s structure changes (e.g., more properties are added than initially defined), V8 may need to store some properties in an external property store (e.g., the properties dictionary for slow properties).
Special JavaScript Objects
Array
- Typed: Typed arrays allow for working with raw binary data in buffers.
- Length: Automatically updates as items are added or removed.
RegExp
- Regular expressions for pattern matching within strings.
ArrayBuffer
- A low-level binary data buffer, essential for handling binary data, often used with typed arrays.
Function
- Passed as object: Functions are first-class objects and can be passed around like other objects.
- Hoisted: Function declarations are hoisted, allowing them to be called before their definition.
- Arrow Function:
- Compact: Shorter syntax for anonymous functions.
- Constructor: Functions can also serve as constructors to create new objects.
- Store as Variable: Functions can be assigned to variables, passed as arguments, or returned from other functions.
- Variable:
- Weakly Typed: JavaScript variables don’t require a specific type.
- Coerced: Variables are often coerced into different types implicitly.
JavaScript Engines: V8 Internals
JavaScript uses “prototype-based-inheritance”, where each object has a reference to a prototype object or “shape” whose properties it incorporates.
Ignition (V8’s Interpreter)
When JavaScript is first loaded, it is translated to bytecode and executed by Ignition, V8’s interpreter. Ignition is quick to load code, but slow to execute it repeatedly since interpreting bytecode is less efficient than running compiled machine code.
Ignition Bytecode
- Ignition is a register machine
- Implicit accumulator register
- Argument registers a0, a1, …
- General purpose registers r0, r1, …
- Ldar/Star = load/store to accumulator
Type Feedback
Type feedback is crucial for optimizations:
- Generic addition in javascript is very complex
- Addition of integers is very simple
- Integer feedback allows to lower instruction to integer addition
Value Edges
Value Edges shows the flow of values between operations in JavaScript code. They demonstrate how the output of one operation is utilized as an input for another. These edges represent dependencies in which the result of a calculation or the value of an object is required as input for later actions.
Inputs to Functions/Comparisons/Operations
- Value edges capture how values travel across distinct sections of code. For example, if a function call requires specific arguments or a comparison operation requires two operands, the inputs are connected together by value edges.
Relaxed Execution Order
- In many circumstances, the execution order of actions connected by value edges can be changed. This means that the optimizer can reorder the sequence in which values are computed to increase speed, as long as the relationships between values are preserved.
Control Edges
Control edges define the program’s control flow, indicating which activities are dependent on the execution of others. These edges guarantee that the execution sequence conforms to the program’s logical structure, such as managing conditional branching, loops, and jumps in the control flow.
Represents the Control Flow Graph
- The control flow graph (CFG) represents all potential pathways through the program during execution. Control edges guarantee that the order of actions corresponds to the planned control flow.
Solid Lines in Turbolizer
- The Turbolizer tool represents both control and value edges as solid lines, making it simple to see how the optimizer manages both data and control flow in a unified manner.
Effect Edges
Effect Edges control dependencies between stateful actions, which are activities that have side effects or change the state of the application. They guarantee that side effects are executed in the right sequence, preventing assumptions about the program’s state from being invalidated.
Ordering of Stateful Operations
- Effect edges ensure the right sequence of actions that change the program’s state. For example, if one operation writes to a variable while another reads from it, the effect edge guarantees that the write comes first.
Side Effects Impact Later Operations
- Certain activities, like as updating an object, writing to the console, or interacting with the DOM, might have side effects that influence subsequent operations. Effect edges handle these dependencies to ensure program correctness.
Not Invalidating Assumptions
- When the optimizer reorders stateful actions, the program’s logic may fail if assumptions about the state are violated. Effect edges guarantee that such assumptions are upheld, preserving the right order of actions.
Effect Chain
- Even actions with minimal side effects may require wiring into the effect chain to preserve appropriate order. Certain mathematical operations, for example, might be re-ordered unless they are part of an effect chain that is triggered by earlier state changes.
Readmore Ignition
- Firing up the Ignition interpreter
- Ignition: Jump-starting an Interpreter for V8
- Ignition: An Interpreter for V8
- Ignition Design Document
- Ignition: Register Equivalence Optimization
- Understanding V8’s Bytecode
- Blazingly Fast Parsing, Part 2
- A guided tour through Chrome’s javascript compiler
Sparkplug (V8’s non optimizing compiler)
- Sparkplug converts the bytecode directly to machine code without doing extensive optimization. Its goal is to create machine code faster than the optimizing compiler (Turbofan), allowing execution to proceed more quickly, particularly for smaller or short-lived routines that may not benefit from severe optimization.
Readmore Sparkplug
- Sparkplug — a non-optimizing JavaScript compiler
- Sparkplug
- Sparkplug, the new lightning-fast V8 baseline JavaScript compiler
Maglev (V8’s Mid Tier Compiler)
- Maglev creates less optimized code than the top-tier JIT compiler, TurboFan, but it compiles quicker. JIT compilers are prevalent in Javascript engines, with the expectation that many layer compilers will provide a better compromise between compilation time and runtime optimization.
- Maglev converts bytecodes into SSA (Static Single-Assignment) nodes, which are defined in the file maglev-ir.h.
- Maglev’s compilation process consists of two optimization phases: building a graph from SSA nodes and optimizing Phi value representations.
Readmore Maglev
Turbofan (V8’s JIT Compiler)
- V8’s Turbofan Compiler converts Ignition bytecode into assembly.
- Turbofan translates the bytecode into “Sea of Nodes” and subsequently to assembly.
- Turbofan is an optimizing compiler that use interpreter feedback to do “speculative” optimizations.
- The compiler works ahead of time by utilizing a “Profiler” to monitor and watch code that needs to be optimized. If there is a “hot function,” the compiler converts it into efficient machine code for execution. Otherwise, if it detects that a previously optimized “hot function” is no longer being utilized, it will “deoptimize” it and return it to bytecode.
Turbofan Graph Building
- Readmore: bytecode-graph-builder.cc
Turbofan Optimize Graph
- Almost all optimization takes place on the sea of nodes.
- Top-down and bottom-up graph transformations.
- Separates transformations from error-prone computations.
- Local reasoning results in gradual transformations.
Turbofan Optimization Pipeline
Speculations
- Assumptions are made about the object’s type
$Speculative.*
- Example :
SpeculativeNumberBitwiseAnd
- Example :
Reductions
- Strategies used during optimizations to reduce Nodes
- Nodes might be optimized away
$Reduce.*
- Example :
ReduceWordNAnd
- Example :
Multiple Phases
- Optimizations pipeline contains different phases
- Example : Typer Phase, Type Lowering Phase, Effect Linearization Phase
Readmore Turbofan
- An Introduction to Speculative Optimization in V8
- Digging into the TurboFan JIT
- Deoptimize me not, v8
- How to start JIT-ting
- Sea of Nodes
- Turbofan Docs
- Hooking up the Ignition to the Turbofan
- Tale of Turbofan
- Ignition+TurboFan and ES2015
- CodeStubAssembler Redux
- Overview of the Turbofan Compiler
- Turbofan IR
- Turbofan’s JIT Design
- Fast Arithmetic for Dynamic Languages
- Deoptimization in V8
- Turbofan a new code generation architecture for V8
- An Internship on Lazyness Slides
- An internship on laziness: lazy unlinking of deoptimized functions
- Turbofan: Function Context Specification
- Turbofan: Rest Parameters and Arguments Exotic Objects optimization plan
- Turbofan Developer Tools Integration
- Turbofan Inlining
- Turbofan Inlining Heuristics
- TurboFan redundant bounds and overflow check elimination
- Turbofan Lazy deoptimization without code patching
- Turbofan Register Allocator
- Projection nodes in TurboFan
- Builtin optimization guards in TurboFan
- Investigation of (transpiled) class performance in V8
- In-place field representation changes
- ES2015 and beyond performance plan
- Fast string concatenation in JavaScript
- Context-sensitive JavaScript operators in TurboFan
- Fast frozen & sealed elements in V8
- Faster calls with arguments mismatch
- Faster Runtime API Calls
- Adventures in JIT compilation: Part 1 - an interpreter
- Adventures in JIT compilation: Part 2 - an x64 JIT
- Escape Analysis in Turbofan
- A crash course in just-in-time (JIT) compilers
Memory and Buffer Management in JavaScript
When dealing with binary data or interfacing with Web APIs, memory management becomes more explicit. This is where buffers, typed arrays, and array buffers come into play.
Understanding Buffers
Buffers are typically used to handle raw binary data, which is not natively supported by JavaScript’s conventional data types such as strings and arrays. Buffers allow you to process binary streams of data more effectively.
Key Characteristics of Buffers
- Fixed Size: Buffers have a predefined size, which means once created, the amount of memory they consume cannot change.
- Raw Binary Data: Buffers store raw bytes, unlike JavaScript strings that are encoded in UTF-16.
- Useful in Network and File Operations: Buffers are particularly useful in low-level operations like interacting with network protocols, reading or writing binary files, or working with images, videos, and other media streams.
Array Buffers
- An ArrayBuffer is a generic, fixed-length block of raw memory.
- An ArrayBuffer doesn’t have any methods to manipulate this data; you need to use a Typed Array or DataView to access the memory.
- Use Cases:
- WebGL: Typed arrays are essential in WebGL for handling 3D graphics, where binary data representing vertices, textures, and shaders needs to be passed directly to the GPU.
- Networking: When working with low-level protocols or binary streaming over WebSockets or other network interfaces.
- File I/O: Reading and writing binary files (e.g., images, audio files) using File APIs or interacting with file system streams in Node.js.
- Multimedia: Manipulating images, audio, and video data for encoding/decoding purposes.
- WebAssembly: Typed arrays are used in conjunction with WebAssembly to provide memory access for high-performance applications.
Typed Arrays
- Typed Arrays are array-like objects that provide a view over an ArrayBuffer, allowing you to read and write binary data directly.
- Common Typed Array Types:
Int8Array
: 8-bit signed integerUint8Array
: 8-bit unsigned integer (no negative values)Int16Array
: 16-bit signed integerUint16Array
: 16-bit unsigned integerFloat32Array
: 32-bit floating-point numberFloat64Array
: 64-bit floating-point number
DataView
- Another way to manipulate ArrayBuffers, offering more flexibility than typed arrays.
- Let you to access any area of the buffer using a specific byte offset and value type. This is important when dealing with complex binary formats in which data types may not be properly aligned.
Asynchronous
Promises
- A JavaScript object that indicates the eventual success (or failure) of an asynchronous action and its value. It serves as a placeholder for the outcome of an operation that hasn’t yet finished but will at some time in the future.
- Key Concepts
- Pending: The initial state. The operation is ongoing, and its result isn’t available yet.
- Fulfilled: The operation has completed successfully, and the promise now holds the resulting value.
- Rejected: The operation failed, and the promise holds the reason for failure (typically an error object).
Async/Await
- Async/Await is a more current and syntactically clear approach to working with promises. Async/await was introduced in ES2017 (ES8) and allows asynchronous code to be expressed in a more synchronous, linear way, which improves readability and maintainability.
- Key Concepts
async function
: Declares an asynchronous function that implicitly returns a promise.await
: Pauses the execution of anasync
function until a promise settles (fulfills or rejects). It allows you to retrieve the resolved value without chaining.then()
calls.
Callbacks
- Function Arguments: JavaScript’s standard built-in callback system often passes functions as parameters.
- Object Converters:
valueOf
: Converts an object to a primitive.toString
: Converts an object to a string representation.
- Property Getters and Setters:
- These trigger automatically when properties are accessed or modified, allowing for more controlled property behavior.
- Proxy: Proxies are often referred to as “meta-programming” tools because they let you modify the behavior of language constructs.
- Property Lookup: Custom behavior for property access.
- Assignment: Intercepts property assignment.
- Function Invocation: Controls function calls within a proxy object.
- Structure:
- Target Object: The object being proxied. It can be any type of object (including arrays or functions).
- Handler Object: This object contains traps (intercepting methods) that define custom behavior for operations performed on the proxy.
- Object Overrides
- Extending the default behavior of built-in objects such as Object, Array, Function, and User-defined objects.
- Overriding Prototype Methods: In JavaScript, you may override methods on the built-in object prototype chain.
- Overriding Constructors: You may override object constructors to change how objects are formed.
- List of symbols