Aiur – ZelluX 的技术博客

Security, Kernel, Virtualization, Programming Languages

x86_64 from wikipedia

202 views | without comments

x86-64 is a superset of the x86 instruction set architecture. x86-64 processors can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities.

The x86-64 specification was designed by Advanced Micro Devices (AMD), who have since renamed it AMD64.[1] Intel has implemented it under the name Intel 64 (formerly EM64T or IA-32e) in its own x86 processors.[2] VIA Technologies has also included x86-64 instructions in their VIA Isaiah architecture. The names x86-64 or x64 are often used as vendor-neutral terms to collectively refer to x86-64 processors from any company.

x86-64 should not be confused with the Intel Itanium (formerly IA-64) architecture, which is not compatible on the native instruction set level with the x86 or x86-64 architecture.

AMD64 Architectural features

The primary defining characteristic of AMD64 is the availability of 64-bit general purpose registers, 64-bit integer arithmetic and logical operations, and 64-bit virtual addresses. The designers took the opportunity to make other improvements as well. The most significant changes include:

  • 64-bit integer capability: All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc. can now operate directly on 64-bit integers. Pushes and pops on the stack are always in 8-byte strides, and pointers are 8 bytes wide.
  • Additional registers: In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. eax,ebx,ecx,edx,ebp,esp,esi,edi) in x86-32 to 16. It is therefore possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent. However, AMD64 still has fewer registers than many common RISC processors (which typically have 32–64 registers) or VLIW-like machines such as the IA-64 (which has 128 registers).
  • Additional XMM (SSE) registers: Similarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16.
  • Larger virtual address space: Current processor models implementing the AMD64 architecture can address up to 256 TB (281,474,976,710,656 bytes)[4] of virtual address space. This limit can be raised in future implementations to 16 EB (18,446,744,073,709,551,616 bytes). This is compared to just 4 GB (4,294,967,296 bytes) for 32-bit x86. This means that very large files can be operated on by mapping the entire file into the process’ address space (which is sometimes faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.
  • Larger physical address space: Current implementations of the AMD64 architecture can address up to 1 TB (1,099,511,627,776 bytes) of RAM; the architecture permits extending this to 4 PB (4,503,599,627,370,496 bytes) in the future (limited by the page table entry format). In legacy mode, Physical Address Extension (PAE) is included, as it is on most current 32-bit x86 processors, allowing access to a maximum of 64 GB (68,719,476,736 bytes).
  • Instruction pointer relative data access: Instructions can now reference data relative to the instruction pointer (RIP register). This makes position independent code, as is often used in shared libraries and code loaded at run time, more efficient.
  • SSE instructions: The original AMD64 architecture adopted Intel’s SSE and SSE2 as core instructions. SSE3 instructions were added in April 2005. SSE2 replaces the x87 instruction set’s IEEE 80-bit precision with the choice of either IEEE 32-bit or 64-bit floating-point mathematics. This provides floating-point operations compatible with many other modern CPUs. The SSE and SSE2 instructions have also been extended to operate on the eight new XMM registers. SSE and SSE2 are available in 32-bit mode in modern x86 processors; however, if they’re used in 32-bit programs, those programs will only work on systems with processors that have the feature. This is not an issue in 64-bit programs, as all AMD64 processors have SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which x64 programs can be run. SSE and SSE2 are generally faster than, and duplicate most of the features of, the traditional x87 instructions, MMX, and 3DNow!.
  • No-Execute bit: The “NX” bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged “no execute” will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via “buffer overrun” or “unchecked buffer” attacks. A similar feature has been available on x86 processors since the 80286 as an attribute of segment descriptors; however, this works only on an entire segment at a time. Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of 0 and a size of 4 GB (4,294,967,296 bytes). AMD was the first x86-family vendor to implement no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used.
  • Removal of older features: A number of “system programming” features of the x86 architecture are not used in modern operating systems and are not available on AMD64 in long (64-bit and compatibility) mode. These include segmented addressing (although the FS and GS segments were retained in vestigial form for compatibility with Windows code)[5], the task state switch mechanism, and Virtual-8086 mode. These features do of course remain fully implemented in “legacy mode,” thus permitting these processors to run 32-bit and 16-bit operating systems without modification.

Virtual address space details

Although virtual addresses are 64 bits wide in 64-bit mode, current implementations (and any chips known to be in the planning stages) do not allow the entire virtual address space of 16 EB (18,446,744,073,709,551,616 bytes) to be used. Most operating systems and applications will not need such a large address space for the foreseeable future (for example, Windows implementations for AMD64 are only populating 16 TB (17,592,186,044,416 bytes), or 44 bits‘ worth), so implementing such wide virtual addresses would simply increase the complexity and cost of address translation with no real benefit. AMD therefore decided that, in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). However, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as “canonical form.” Canonical form addresses run from 0 through 00007FFF`FFFFFFFF, and from FFFF8000`00000000 through FFFFFFFF`FFFFFFFF, for a total of 256 TB (281,474,976,710,656 bytes) of usable virtual address space.

This “quirk” allows an important feature for later scalability to true 64-bit addressing: many operating systems (including, but not limited to, the Windows NT family) take the higher-addressed half of the address space (named kernel space) for themselves and leave the lower-addressed half (user space) for application code, user mode stacks, heaps, and other data regions. The “canonical address” design ensures that every AMD64 compliant implementation has, in effect, two memory halves: the lower half starts at 00000000`00000000 and “grows upwards” as more virtual address bits become available, while the higher half is “docked” to the top of the address space and grows downwards. Also, fixing the contents of the unused address bits prevents their use by operating system as flags, privilege markers, etc., which could become problematic when the architecture is indeed extended to 52, 56, 60 and 64 bits.

Related Posts

  • No Related Post

Written by zellux

February 9th, 2009 at 10:18 am

Posted in Computer System

Tagged with ,

Leave a Reply

FireStats icon Powered by FireStatsBetter Tag Cloud