Aiur – ZelluX 的技术博客

Security, Kernel, Virtualization, Programming Languages

Archive for February, 2009

2009-02-19 Notes

with one comment

今天下午真是惊悚,我想把机房的winxp分区删了,ftp上好放点美剧,结果winxp的那个分区是扩展分区,删掉后导致linux的几个分区都消失了。赶紧把硬盘拆下来装到实验室用Disk Genius修复了下,数据基本没什么损坏,分区表还是有点问题。差一点俺就见不到这个博客了 =_=

然后把昨天折腾了一晚上没搞定的debian 4安装搞定了,关键在于netinst.iso的版本号要和hd-media的完全一致,4.0r7。

1. 编译xen/linux所需的包
apt-get install gettext zlib1g-dev python-dev libncurses-dev libssl-dev libx11-dev bridge-utils iproute gawk

另外 initrd文件的生成需要安装initrd-tools包

2. kernel中memory barrier的实现很简单,barrier宏展开后就是
asm volatile(“” : : : “memory”)
这样就保证了在barrier()执行后,cpu不会直接读取寄存器中cache的内存值。

3. 生成initrd
mkinitramfs -o /boot/initrd-2.6.18.8-xen.img 2.6.18.8-xen

4. syscall和m2_fastcall的性能测试
测的是getpid()函数,当然为了保证m2_fastcall不在运行逻辑上吃亏,它的对应功能仅仅是返回current->pid,第一次测出来的结果是syscall明显由于m2_fastcall。宋大牛指出很有可能是glibc做了缓存,果然,自己用汇编发软中断后的数据就正常了。

Written by zellux

February 19th, 2009 at 8:52 pm

Posted in Computer System

Tagged with

2009-02-10 Notes

without comments

1. x86_64上不支持segment机制,Xen是通过页表机制来控制访问权限的,Xen及其相关数据驻留在0xffff8000 00000000 – 0xffff87ff ffffffff,也就是在原来的kernel space的低地址部分,而x86_32上驻留在最上面的。

[include/asm-x86/config.h]
/*
 * Memory layout:
 *  0x0000000000000000 - 0x00007fffffffffff [128TB, 2^47 bytes, PML4:0-255]
 *    Guest-defined use (see below for compatibility mode guests).
 *  0x0000800000000000 - 0xffff7fffffffffff [16EB]
 *    Inaccessible: current arch only supports 48-bit sign-extended VAs.
 *  0xffff800000000000 - 0xffff803fffffffff [256GB, 2^38 bytes, PML4:256]
 *    Read-only machine-to-phys translation table (GUEST ACCESSIBLE).
 *  0xffff804000000000 - 0xffff807fffffffff [256GB, 2^38 bytes, PML4:256]
 *    Reserved for future shared info with the guest OS (GUEST ACCESSIBLE).
 *  0xffff808000000000 - 0xffff80ffffffffff [512GB, 2^39 bytes, PML4:257]
 *    Reserved for future use.
 *  0xffff810000000000 - 0xffff817fffffffff [512GB, 2^39 bytes, PML4:258]
 *    Guest linear page table.
 *  0xffff818000000000 - 0xffff81ffffffffff [512GB, 2^39 bytes, PML4:259]
 *    Shadow linear page table.
 *  0xffff820000000000 - 0xffff827fffffffff [512GB, 2^39 bytes, PML4:260]
 *    Per-domain mappings (e.g., GDT, LDT).
 *  0xffff828000000000 - 0xffff8283ffffffff [16GB,  2^34 bytes, PML4:261]
 *    Machine-to-phys translation table.
 *  0xffff828400000000 - 0xffff8287ffffffff [16GB,  2^34 bytes, PML4:261]
 *    Page-frame information array.
 *  0xffff828800000000 - 0xffff828bffffffff [16GB,  2^34 bytes, PML4:261]
 *    ioremap()/fixmap area.
 *  0xffff828c00000000 - 0xffff828c3fffffff [1GB,   2^30 bytes, PML4:261]
 *    Compatibility machine-to-phys translation table.
 *  0xffff828c40000000 - 0xffff828c7fffffff [1GB,   2^30 bytes, PML4:261]
 *    High read-only compatibility machine-to-phys translation table.
 *  0xffff828c80000000 - 0xffff828cbfffffff [1GB,   2^30 bytes, PML4:261]
 *    Xen text, static data, bss.
 *  0xffff828cc0000000 - 0xffff82ffffffffff [461GB,             PML4:261]
 *    Reserved for future use.
 *  0xffff830000000000 - 0xffff83ffffffffff [1TB,   2^40 bytes, PML4:262-263]
 *    1:1 direct mapping of all physical memory.
 *  0xffff840000000000 - 0xffff87ffffffffff [4TB,   2^42 bytes, PML4:264-271]
 *    Reserved for future use.
 *  0xffff880000000000 - 0xffffffffffffffff [120TB, PML4:272-511]
 *    Guest-defined use.
 */

2.shadow page table主要用在两个地方,一是full-virtualization下的页表维护,overhead很大,不过有了VT-x或者AMD-V的硬件支持后会在一定程度上减少这个代价;二是在guest os被live-migrate的时候,需要一个shadow page table来跟踪转移后被修改的页面。

今天还搞清楚了以前我一直模糊的一个概念。以前翻过一点那本The Definitive Guide to Xen Hypervisor,里面提到一个writable page table,然后我就把这个东西和后来看的那篇paper的shadow page table搞混了,其实是两个完全不一样的东西。shadow page table如前文所说,仅用于full-virtualization的情况,硬件访问到的是Xen维护的shadow page table而不是guest page table;而writable page table则是用在para-virtualization的场合,

3. arch/x86/traps.c::do_page_fault()->fixup_page_fault()

当满足以下几个条件时,xen调用ptwr_do_page_fault()处理guest os更新页表的情况:
(1) 不在irq中断过程中 且 中断未被禁用(eflags的if被置上)
(2) 出错地址不属于hypervisor的保留地址
(3) guest os处于kernel mode
(4) error code的write位被置上,而reserved位未被置上

接下来看这个关键性的ptwr_do_page_fault(),通过guest_get_eff_l1e获得被访问的virtual address对应的PTE,然后获得这个PTE对应的page,接下来确定当前的情况是guest os正在尝试修改一个PTE,要满足下面几个条件:
(1) present位被置上,rw位没有被置上
(2) mfn(machine frame number)正确,即小于最大值,检查的代码是!mfn_valid(l1e_get_pfn(pte)),这是由于是在pv模式下,mfn=pfn。
(3) page的类型PGT_l1_page_table,即最下层的page table
(4) page的引用计数不为0
(5) page的owner为当前domain

这些检查都通过后,调用x86_emulate()函数执行ptwr_emulate_ops代码。

另外Xen 3.3.1这里似乎利用了reserved bit位,根据Intel手册的说法,When the PSE and PAE flags in control register CR4 are set, the processor generates a page fault if reserved bits are not set to 0. 以及The RSVD flag indicates that the processor detected 1s in reserved bits of the page directory, when the PSE or PAE flags in control register CR4 are set to 1。于是就可以在第一次guest os试图修改pte被xen截获后把这个reserved bit给置上,下次访问前还是会因为这个reserved bit而出page table,此时检查下guest os改的machine address是否正确,然后再把reserved bit给清零即可。

Written by zellux

February 11th, 2009 at 9:53 am

Posted in Computer System

Tagged with ,

2009-02-09 Notes

without comments

1. Fishing reading Xen 内存管理综述 and have a superficial look on Xen source code.

2. 在Debian x86-64上编译并安装了Xen 3.3.0,安装需要的几个包zlib1g-dev python-dev libncurses-dev libssl-dev libx11-dev bridge-utils iproute gawk gettext

3.interrupt gate的注册

[arch/x86/traps.c]
    set_swint_gate(TRAP_int3,&int3);         /* usable from all privileges */
    set_swint_gate(TRAP_overflow,&overflow); /* usable from all privileges */
    set_intr_gate(TRAP_bounds,&bounds);
    set_intr_gate(TRAP_invalid_op,&invalid_op);
    set_intr_gate(TRAP_no_device,&device_not_available);
    set_intr_gate(TRAP_copro_seg,&coprocessor_segment_overrun);
    set_intr_gate(TRAP_invalid_tss,&invalid_TSS);

以page_fault为例

[arch/x86/x86_64/entry.s]
ENTRY(handle_exception)
        SAVE_ALL
handle_exception_saved:
        testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
        jz    exception_with_ints_disabled
        sti
1:      movq  %rsp,%rdi
        movl  UREGS_entry_vector(%rsp),%eax
        leaq  exception_table(%rip),%rdx
        GET_CURRENT(%rbx)
        PERFC_INCR(PERFC_exceptions, %rax, %rbx)
        callq *(%rdx,%rax,8)
        testb $3,UREGS_cs(%rsp)
        jz    restore_all_xen
        leaq  VCPU_trap_bounce(%rbx),%rdx
        movq  VCPU_domain(%rbx),%rax
        testb $1,DOMAIN_is_32bit_pv(%rax)
        jnz   compat_post_handle_exception
        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
        jz    test_all_events
        call  create_bounce_frame
        movb  $0,TRAPBOUNCE_flags(%rdx)
        jmp   test_all_events

ENTRY(page_fault)
        movl  $TRAP_page_fault,4(%rsp)
        jmp   handle_exception

SAVE_ALL负责把rsp以外的15个寄存器的值保存入栈,然后从exception table中读出对应的函数地址,并调用(do_page_fault)。

4. 新年第一次例会讲了两篇发在SOSP ’1967上的古董paper

Dijkstra的The structure of the “the”-multiprogramming system,第一次提出了操作系统的分层结构,从而可以运行多个任务,另外里面还提出了semaphore等用来解决concurrent问题的方案,以及类似virtual address的想法,如果这两个概念也是在这个paper第一次提出的话,我只能说Dijkstra是个神了 @.@

第二篇是Peter Denning的The working set model for program behavior,里面用了大量的数学公式(主要是概率统计方面)推导得出了\tau和traverse time T的关系,这里的\tau是一个参数,某个页面在经过\tau时间没有被访问后即被认为是可以替换的,这样就不需要用LRU之类的算法花时间来寻找一个victim page了。paper后面还给出了调度和资源分配的算法。

Written by zellux

February 11th, 2009 at 9:53 am

Posted in Computer System

Tagged with

Emacs tramp

without comments

很好用的东东,可以方便的打开需要root权限或是远程服务器上的文件。

统一的url格式是 /method:usr@machine:port/path/to.file,这种方式需要在载入tramp前设置tramp-syntax

(setq tramp-syntax 'url)
(require 'tramp)

也可以用(setq tramp-default-method “scp”) 指定默认的访问方法,这样就不需要/method://了

以我现在的org日程管理为例,个人日程文件保存在机房的73号机上,实验室电脑和寝室电脑上的电脑只要通过tramp远程访问这个org文件即可:

(setq org-agenda-files (list "/scp:wyx@10.132.140.73:/home/wyx/notes/lab.org"))

Written by zellux

February 9th, 2009 at 1:54 pm

Posted in Tools

Tagged with ,

x86_64 from wikipedia

without comments

x86-64 is a superset of the x86 instruction set architecture. x86-64 processors can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities.

The x86-64 specification was designed by Advanced Micro Devices (AMD), who have since renamed it AMD64.[1] Intel has implemented it under the name Intel 64 (formerly EM64T or IA-32e) in its own x86 processors.[2] VIA Technologies has also included x86-64 instructions in their VIA Isaiah architecture. The names x86-64 or x64 are often used as vendor-neutral terms to collectively refer to x86-64 processors from any company.

x86-64 should not be confused with the Intel Itanium (formerly IA-64) architecture, which is not compatible on the native instruction set level with the x86 or x86-64 architecture.

Read the rest of this entry »

Written by zellux

February 9th, 2009 at 10:18 am

Posted in Computer System

Tagged with ,

FireStats icon Powered by FireStatsBetter Tag Cloud