Finance Notes

ch6.hft_optimization

Ch 6. HFT Optimization – Architecture and Operating System

The most important question to ask is what we are trying to achieve – what level of performance is good enough for HFT trading strategies?

  • Context switches (<= 20micros)
  • Building lock-free data structures (<= 20micros)

Context Switches

  • Definition:
    • Operation by which states of a running process/thread are saved and the state of a different process/thread is restored.
    • Allows resumption of execution where it left off.
    • Foundation for modern Operating Systems (OSs) multitasking, creating the illusion of running more processes than available CPU cores.

Types of Context Switches

  1. Hardware or Software Context Switches:

    • Hardware Context Switching:
      • Uses special hardware features (e.g., Task State Segments (TSSs)).
      • Saves register and processor state for the current process, then switches to a different process.
      • Generally faster due to special registers and instructions but can be slower in some cases because all registers must be saved.
    • Software Context Switching:
      • Saves the current stack pointer and loads the new stack pointer to execute new code.
      • Registers, flags, data segments, and other relevant registers are pushed onto the old stack and popped off the new stack.
      • Preferred in modern OSs for better fault tolerance and customization of saved/restored registers.
  2. Context Switches Between Threads or Processes:

    • Process Switching Latency:
      • Latency associated with switching between processes.
      • More time-consuming due to the need to remove and reload code from the cache and memory.
      • Requires cleaning/flushing of the virtual memory space (e.g., Translation Lookaside Buffer (TLB)).
    • Thread Switching Latency:
      • Latency associated with switching between threads.
      • Generally faster since threads share the same address space, reducing the need for flushing/cleaning memory structures.
      • Less overhead in virtual memory management compared to process switching.

Why Context Switches Are Beneficial

  • Multitasking:

    • Task schedulers in modern OSs switch processes in and out of the CPU.
    • Reasons for switching:
      • Process completion.
      • Waiting on I/O or synchronization.
      • Preventing CPU starvation by CPU-intensive processes.
  • Interrupt Handling:

    • Common in modern architectures.
    • Processes initiate I/O operations and are blocked until completion.
    • Scheduler switches out blocked processes, resuming others.
    • OS installs interrupt handlers to manage resource access (e.g., disk, NICs).
    • Upon I/O completion, interrupt handlers wake up the initiating process.
  • User and Kernel Mode Switching:

    • Example: Disk or packet read completion.
    • Part of the operation occurs in kernel space (e.g., invoking interrupt handler).
    • Data processing usually occurs in user space.
    • Some user space instructions force transitions to kernel mode.
    • Context switches may occur during these transitions on some systems.

Steps and Operations Involved in a Context Switch

  1. Saving the State of the Current Process:

    • Save the state in a Process Control Block (PCB), which includes:
      • Registers
      • Stack Pointer (SP)
      • Program Counter (PC)
      • Memory maps
      • Various tables and lists related to the current thread or process.
  2. Cache and TLB Management:

    • Flush and/or invalidate the cache.
    • Flush the Translation Lookaside Buffer (TLB), which handles virtual to physical memory address translations.
  3. Restoring the State for the Next Process:

    • Restore the state by loading the registers and data from the PCB of the next thread or process to be run.

Why Context Switches Are Bad for HFT

Default CPU Task Scheduler Behavior:

  • Default algorithms aim for:
    • Fairness in CPU resource allocation.
    • Energy conservation and improved efficiency.
    • Maximizing CPU throughput by prioritizing either shortest or longest jobs first.

HFT Application Requirements:

  • Energy Efficiency:

    • Prefer not to conserve energy.
    • Support for overclocked servers, which are not energy-efficient.
    • Measures to prevent server overheating are secondary.
  • Scheduling and Priority Control:

    • Critical to prioritize HFT processes over low-priority tasks.
    • Avoid CPU starvation for HFT applications by ensuring they get maximum CPU time.
    • Prevent preemption of HFT threads/processes, regardless of their CPU consumption.

Strategies for Optimizing HFT Performance:

  • Kernel and OS Parameter Adjustments:

    • Modify kernel and OS settings to prioritize HFT requirements.
  • Core Pinning:

    • Pin critical HFT processes to specific, isolated, and dedicated CPU cores.
    • Ensure these cores do not preempt HFT processes.
  • Non-HFT Process Management:

    • Move non-HFT processes to a small subset of cores to isolate them from HFT operations.

Expensive Tasks in Context Switching

  1. Task Scheduling:

    • Overhead: Determining which process/thread to run next can be time-consuming and adds overhead to the context switch.
  2. Flushing the Translation Lookaside Buffer (TLB):

    • Expensive: The TLB must be flushed to clear virtual memory address translations, which is computationally intensive.
  3. Cache Invalidation:

    • Expensive: Similar to TLB invalidation, cache invalidation involves:
      • Writing edited data from the cache to memory.
      • Fetching new code from memory to replace old code in the cache (cache miss).
      • Initial cache misses slow down the resumption of the process assigned CPU resources after the context switch.

Techniques to Avoid or Minimize Context Switches

  1. Pinning Threads to CPU Cores:

    • CPU Isolation: Implement CPU isolation by pinning critical or CPU-intensive threads (hot or spinning threads) to specific cores.
    • Benefits: Ensures minimal to no context switches for these threads, optimizing performance.
  2. Avoiding System Calls That Lead to Pre-emption:

    • Minimize Blocking System Calls:
      • System calls that block disk or network I/O cause the calling thread to block and result in a context switch.
      • Reduce the use of blocking system calls to minimize context switches.
    • Use Kernel Bypass:
      • Bypass system calls for network I/O operations, which are common in HFT applications.
      • Kernel Bypass Overview: Avoids system call overhead by utilizing CPU resources, thereby reducing context switches.

Building Lock-Free Data Structures

Why Locks Are Needed (Non-HFT Applications)

  • Concurrent Access: Ensuring multiple threads/processes can access shared resources safely.
  • Synchronization Primitives: Using mechanisms like mutexes, semaphores, and critical sections to prevent data corruption in thread-unsafe code sections.
    • Mutexes: Ensure mutual exclusion, allowing only one thread to access a resource at a time.
    • Semaphores: Control access to a resource by multiple threads.
    • Critical Sections: Protect portions of code that access shared resources.

Problems and Inefficiencies with Using Locks

  • Blocking: When a thread attempts to acquire a lock already held by another thread, it blocks until the lock is released, leading to:
    • Increased latency.
    • Reduced throughput.
    • Potential deadlocks.