In a paged memory system, every memory access by a program involves translating a virtual address to a physical address by looking up the Page Table. But the Page Table itself resides in main memory. This means every single memory access by a program requires TWO memory accesses: one to read the Page Table entry, and one to access the actual data. This doubles the effective memory access time!
The TLB solves this by caching recently used Page Table entries in a tiny, ultra-fast hardware cache inside the CPU.
The TLB Hit Ratio is the percentage of memory accesses that find the page mapping in the TLB. Modern TLBs achieve hit ratios of 99%+ despite holding only 64-1024 entries, because programs exhibit strong Locality of Reference.
EAT = h × (t_TLB + t_mem) + (1 - h) × (t_TLB + 2 × t_mem)
Where $h$ is the hit ratio, t_TLB is the TLB lookup time, and t_mem is the memory access time.
With a 99% hit ratio, t_TLB = 1ns, t_mem = 100ns: EAT = 0.99 × 101 + 0.01 × 201 = 102.0 ns. Nearly as fast as a single memory access!
When the OS switches from Process A to Process B, Process B has a completely different address space (different Page Table). The TLB entries from Process A are now invalid. The OS must either flush the entire TLB (expensive, causes many TLB misses) or use Address Space Identifiers (ASIDs) to tag each TLB entry with a process ID, allowing entries from multiple processes to coexist.
When the TLB is full and a new entry must be loaded, an old entry is evicted. Common policies: LRU (Least Recently Used) or Random replacement.