Why was Intel a no-show on No Execute?

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

This has been discussed at quite some length in these newsgroups, but now it
looks like the mainstream press are starting to hear about it too. Intel had
to be embarrassed into including NX into its AMD64 implementation.

http://story.news.yahoo.com/news?tmpl=story&cid=1738&ncid=1209&e=7&u=/zd/20040525/tc_zd/127930

There's a few things that this article writer has gotten wrong, but a few
things were right.

One thing he got partially wrong was his statement about Intel having no
execute protection in the 16-bit segments. The feature was still there in
the 32-bit segments, Intel never got rid of them. It was stupid OS designers
who decided to ignore the feature that caused this problem.

Yousuf Khan

--
Humans: contact me at ykhan at rogers dot com
Spambots: just reply to this email address ;-)
138 answers Last reply
More about intel show execute
  1. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Yousuf Khan wrote:

    > One thing he got partially wrong was his statement about Intel
    > having no execute protection in the 16-bit segments. The feature
    > was still there in the 32-bit segments, Intel never got rid of
    > them. It was stupid OS designers who decided to ignore the
    > feature that caused this problem.

    Are you calling them "stupid" because they opted for paging
    instead of segmentation, in an effort to write a portable OS?

    Do you think there should be an x86-specific Linux branch,
    using segmentation instead of paging?
  2. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Grumble wrote:

    > Yousuf Khan wrote:

    >> One thing he got partially wrong was his statement about Intel
    >> having no execute protection in the 16-bit segments. The feature
    >> was still there in the 32-bit segments, Intel never got rid of
    >> them. It was stupid OS designers who decided to ignore the
    >> feature that caused this problem.

    > Are you calling them "stupid" because they opted for paging
    > instead of segmentation, in an effort to write a portable OS?

    > Do you think there should be an x86-specific Linux branch,
    > using segmentation instead of paging?


    I don't think it would be so hard to put all the data in a
    data segment, and the code in a code segment, without overlapping
    them. It requires the CS: prefix on any loads from the code
    segment. Self modifying code is out of style these days,
    so that shouldn't be much of a problem.

    Now, for things like JIT where code is constantly being
    written while running some arrangement would need to be made.

    -- glen
  3. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Grumble <a@b.c> wrote:
    > Yousuf Khan wrote:
    >
    > > One thing he got partially wrong was his statement about Intel
    > > having no execute protection in the 16-bit segments. The feature
    > > was still there in the 32-bit segments, Intel never got rid of
    > > them. It was stupid OS designers who decided to ignore the
    > > feature that caused this problem.
    >
    > Are you calling them "stupid" because they opted for paging
    > instead of segmentation, in an effort to write a portable OS?
    >
    > Do you think there should be an x86-specific Linux branch,
    > using segmentation instead of paging?
    >

    There was one for quite a while for pre-386 modes/machines.

    --
    Sander

    +++ Out of cheese error +++
  4. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
    > I don't think it would be so hard to put all the data in a
    > data segment, and the code in a code segment, without overlapping
    > them. It requires the CS: prefix on any loads from the code
    > segment. Self modifying code is out of style these days,
    > so that shouldn't be much of a problem.

    That _still_ won't help (never mind interpreted or JIT).

    If an attacker can redirect execution by modifying the
    return address on the stack, s/he doesn't need their own
    executable code. Just point to data like "/bin/sh" and
    return to an `exec` syscall.

    -- Robert
  5. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
    >> I don't think it would be so hard to put all the data in a
    >> data segment, and the code in a code segment, without overlapping
    >> them. It requires the CS: prefix on any loads from the code
    >> segment. Self modifying code is out of style these days,
    >> so that shouldn't be much of a problem.
    >
    > That _still_ won't help (never mind interpreted or JIT).
    >
    > If an attacker can redirect execution by modifying the
    > return address on the stack, s/he doesn't need their own
    > executable code. Just point to data like "/bin/sh" and
    > return to an `exec` syscall.

    Ah, but you make me think -- all current CPUs have an internal
    hardware call/return stack to speed up branch [mis]prediction.

    It would be relatively simple to check this hw stack against
    the memory stack and generate a fault if return addresses
    don't match.

    This could be enabled by a bit in the MSR if the OS has support
    to handle/log "return addr faults". Most pgms should never
    generate a return fault, but a mechanism could be made to
    except those few that do.

    A slightly bigger problem is the hw stacks are of limited
    depth (6?) and it might be possible to flood them out.
    But variable stack entry pointers would become more effective.

    -- Robert
  6. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    > It would be relatively simple to check this hw stack against
    > the memory stack and generate a fault if return addresses
    > don't match.

    Lookup "call-with-current-continuation" to see why this is not a good idea.
    Or maybe just think of how to implement exception handling.


    Stefan
  7. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Grumble <a@b.c> wrote:
    > Yousuf Khan wrote:
    >
    >> One thing he got partially wrong was his statement about Intel
    >> having no execute protection in the 16-bit segments. The feature
    >> was still there in the 32-bit segments, Intel never got rid of
    >> them. It was stupid OS designers who decided to ignore the
    >> feature that caused this problem.
    >
    > Are you calling them "stupid" because they opted for paging
    > instead of segmentation, in an effort to write a portable OS?

    No, for not opting to use both. There was no mutual exclusivity between
    paging and segmentation. Both could be used and complement each other.

    I think the original OS designers in their haste to port Unix to the new
    32-bit Intel chip did a simple cross-compile, and then didn't bother to make
    use of any of the Intel-specific features of their architecture. They just
    left it at "good enough". Of course, using Intel features would've made them
    non-portable, but a lot of stuff gets non-portable at the lowest levels of
    the kernel anyways.

    > Do you think there should be an x86-specific Linux branch,
    > using segmentation instead of paging?

    There already was. The original pre-1.0 Linux kernels were using segments
    *and* paging. I think with addition of new people into the development team,
    Linux's original purpose got changed from being the ultimate Intel OS (Unix
    or otherwise), to being a free version of portable Unix.

    Yousuf Khan
  8. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > That _still_ won't help (never mind interpreted or JIT).
    >
    > If an attacker can redirect execution by modifying the
    > return address on the stack, s/he doesn't need their own
    > executable code. Just point to data like "/bin/sh" and
    > return to an `exec` syscall.

    How's an attacker to do that, when the the code, the stack and the heap
    don't even share the same memory addresses?

    Yousuf Khan
  9. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Yousuf Khan <news.tally.bbbl67@spamgourmet.com> wrote:
    > How's an attacker to do that, when the the code, the stack and the heap
    > don't even share the same memory addresses?

    Easy. Overwrite the stack with crafted input to an unrestricted
    input call (getch() is a frequent culprit). This is the basic
    buffer overflow.

    In the location for the return address (where EBP is usually
    pointing), put in a return address that points to a suitably
    dangerous part of the existing code. Like an `exec` syscall.
    Above this return address, put in data to make that syscall
    nefarious.

    -- Robert
  10. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    >> It would be relatively simple to check this hw stack against
    >> the memory stack and generate a fault if return addresses
    >> don't match.
    >
    > Lookup "call-with-current-continuation" to see why this is not a good idea.
    > Or maybe just think of how to implement exception handling.

    Exception handling is easy -- mismatch produces a MC interrupt.
    The kernelspace ISR checks the MSRs which tell it that a return
    addr mismatch occurred. Kenel decides what to do -- abort proc,
    log, or proceed.

    Sure it'll be slow, but how often are calls not paired with
    returns? call jtable[eax*4] is the standard syntax for a
    jump table, not `push eax/ret`

    -- Robert
  11. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Sander Vesik <sander@haldjas.folklore.ee> wrote:
    >> Do you think there should be an x86-specific Linux branch,
    >> using segmentation instead of paging?
    >>
    >
    > There was one for quite a while for pre-386 modes/machines.

    That was Minix. Linux has always been for 386 and later machines only.

    Yousuf Khan
  12. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > In comp.sys.ibm.pc.hardware.chips Yousuf Khan
    > <news.tally.bbbl67@spamgourmet.com> wrote:
    >> How's an attacker to do that, when the the code, the stack and the
    >> heap don't even share the same memory addresses?
    >
    > Easy. Overwrite the stack with crafted input to an unrestricted
    > input call (getch() is a frequent culprit). This is the basic
    > buffer overflow.
    >
    > In the location for the return address (where EBP is usually
    > pointing), put in a return address that points to a suitably
    > dangerous part of the existing code. Like an `exec` syscall.
    > Above this return address, put in data to make that syscall
    > nefarious.

    Nope, won't work. Segmentation would protect it completely. There is no way
    for data written to the heap to touch the data in the stack. Stack segment
    and data segment are separate. It's like as if the stack had its own
    container, the code has its own, and the data heap its own. What happens in
    one container won't even reach the other containers.

    Face it, segments were the perfect security mechanism, and systems
    developers completely ignored it!

    Yousuf Khan
  13. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Robert Redelmeier wrote:

    > Overwrite the stack with crafted input to an unrestricted
    > input call (getch() is a frequent culprit).

    There is no getch() in ISO C.

    fgetc(), getc(), and getchar() return a single character.

    Perhaps you meant gets().
  14. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Robert Redelmeier wrote:

    > Ah, but you make me think -- all current CPUs have an internal
    > hardware call/return stack to speed up branch [mis]prediction.

    e.g. the Athlon implements a 12-entry return address stack to
    predict return addresses from a near or far call. As CALLs are
    fetched, the next EIP is pushed onto the return stack. Subsequent
    RETs pop a predicted return address off the top of the stack.

    > It would be relatively simple to check this hw stack against
    > the memory stack and generate a fault if return addresses
    > don't match.

    I think you've just killed the performance of recursive functions.

    > This could be enabled by a bit in the MSR if the OS has support
    > to handle/log "return addr faults". Most pgms should never
    > generate a return fault

    This is where I think you are wrong.

    The K8 has a counter to measure this event:

    88h IC Return stack hit
    89h IC Return stack overflow

    It would be interesting to take, say, SPEC CPU2000, and count
    the number of overflows for each benchmark. I might try.

    --
    Regards, Grumble
  15. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Grumble <a@b.c> writes:

    >> It would be relatively simple to check this hw stack against
    >> the memory stack and generate a fault if return addresses
    >> don't match.

    >I think you've just killed the performance of recursive functions.

    And possibly longjmp()/setcontext() and the like; quite a bit of
    additional work is needed to fix all such things (and if you want to
    throw in binary compatibility, it's going to be harder still.

    Casper
  16. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    >>>>> "YK" == Yousuf Khan <news.tally.bbbl67@spamgourmet.com> writes:

    YK> That was Minix. Linux has always been for 386 and later machines
    YK> only.

    I think the ELKS people will be saddened to hear that.


    /Benny
  17. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Grumble <a@b.c> wrote:
    > I think you've just killed the performance of recursive functions.

    I don't think so. For a recursive function there are many
    calls, possibly flooding out the hw return stack. But every
    call has a return, and that address _is_ correct on both the
    hw and memory stacks.

    > 88h IC Return stack hit
    > 89h IC Return stack overflow
    >
    > It would be interesting to take, say, SPEC CPU2000, and count
    > the number of overflows for each benchmark. I might try.

    Excellent! I do not suggest trapping out overflows.
    They're to occur on deep recursion which should not contain
    evil getch() calls. Just trap misses.

    -- Robert

    >
  18. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Grumble <a@b.c> wrote:
    > There is no getch() in ISO C.
    > Perhaps you meant gets().

    Thank you for the correction. I do mean gets().
    I apologize for any confusion.

    -- Robert

    >
  19. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Yousuf Khan <news.tally.bbbl67@spamgourmet.com> wrote:
    > Nope, won't work. Segmentation would protect it completely. There is no way
    > for data written to the heap to touch the data in the stack. Stack segment
    > and data segment are separate. It's like as if the stack had its own
    > container, the code has its own, and the data heap its own. What happens in
    > one container won't even reach the other containers.

    True in a literal sense.

    But `c` compilers have this habit of allocating local variable
    space on the stack. So when `char input[80];` is coded in a
    routine, ESP gets decreased by 80 and that array is sitting
    just below the return address!

    I don't think it's _required_ by any standard that local vars are
    allocated on the stack, but it sure makes memory managment easy.

    AFAIK, only global vars and large malloc()s are put on the heap.

    -- Robert
  20. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Benny Amorsen <amorsen@vega.amorsen.dk> wrote:
    >>>>>> "YK" == Yousuf Khan <news.tally.bbbl67@spamgourmet.com> writes:
    >
    >> That was Minix. Linux has always been for 386 and later machines
    >> only.
    >
    > I think the ELKS people will be saddened to hear that.

    So, it never surprises me to find Linux being ported to do something or
    another at some point in time. I guess the question these days to ask is
    whether there is something Linux hasn't been ported to? Commodore 64? Apple
    II?

    Yousuf Khan
  21. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Yousuf Khan <news.tally.bbbl67@spamgourmet.com> wrote:
    > Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > > In comp.sys.ibm.pc.hardware.chips Yousuf Khan
    > > <news.tally.bbbl67@spamgourmet.com> wrote:
    > >> How's an attacker to do that, when the the code, the stack and the
    > >> heap don't even share the same memory addresses?
    > >
    > > Easy. Overwrite the stack with crafted input to an unrestricted
    > > input call (getch() is a frequent culprit). This is the basic
    > > buffer overflow.
    > >
    > > In the location for the return address (where EBP is usually
    > > pointing), put in a return address that points to a suitably
    > > dangerous part of the existing code. Like an `exec` syscall.
    > > Above this return address, put in data to make that syscall
    > > nefarious.
    >
    > Nope, won't work. Segmentation would protect it completely. There is no way
    > for data written to the heap to touch the data in the stack. Stack segment

    But procedure local variables (including arrays) don't live in the heap,
    they live on the stack.

    > and data segment are separate. It's like as if the stack had its own
    > container, the code has its own, and the data heap its own. What happens in
    > one container won't even reach the other containers.

    Doesn't matter. All you need for an exploit is to be able to make *one*
    system call. And for that, you don't need to write to the code segment
    at all. The stack is enough.

    >
    > Face it, segments were the perfect security mechanism, and systems
    > developers completely ignored it!
    >
    > Yousuf Khan
    >
    >

    --
    Sander

    +++ Out of cheese error +++
  22. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Robert Redelmeier wrote:

    > In comp.sys.ibm.pc.hardware.chips Grumble wrote:
    >
    >> I think you've just killed the performance of recursive functions.
    >
    > I don't think so. For a recursive function there are many
    > calls, possibly flooding out the hw return stack. But every
    > call has a return, and that address _is_ correct on both the
    > hw and memory stacks.

    You don't call any other function in your recursive functions? :-)

    >> 88h IC Return stack hit
    >> 89h IC Return stack overflow
    >>
    >> It would be interesting to take, say, SPEC CPU2000, and count
    >> the number of overflows for each benchmark. I might try.
    >
    > Excellent! I do not suggest trapping out overflows.
    > They're to occur on deep recursion which should not contain
    > evil getch() calls. Just trap misses.

    As far as I can tell, and with the exception of recursive
    functions which call no other function, RAS overflow will
    cause a RET misprediction.
  23. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Sander Vesik <sander@haldjas.folklore.ee> wrote:
    >> and data segment are separate. It's like as if the stack had its own
    >> container, the code has its own, and the data heap its own. What
    >> happens in one container won't even reach the other containers.
    >
    > Doesn't matter. All you need for an exploit is to be able to make
    > *one* system call. And for that, you don't need to write to the code
    > segment at all. The stack is enough.

    The only place you can run code is from the code segment. If you insert code
    into the stack segment, none of it will be executable. At best it might end
    up causing the return address to go to the wrong part of the code segment
    and therefore run the program from the wrong point, but more likely the
    program will just end up locking up and be shutdown by the OS.

    Yousuf Khan
  24. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    On Thu, 27 May 2004, Yousuf Khan wrote:

    > The only place you can run code is from the code segment. If you insert code
    > into the stack segment, none of it will be executable. At best it might end
    > up causing the return address to go to the wrong part of the code segment
    > and therefore run the program from the wrong point, but more likely the
    > program will just end up locking up and be shutdown by the OS.

    Changing branch address and stack values that get loaded to
    arument registers (or just plain stack values on a stack machine)
    are enough.

    An object dump of a binary with stack overflow reveals the address
    of a "system call" instruction, which is enough to know what return
    adress is needed.

    i.e. you don't need new code to execute you just need to get to
    existing insn's in the binary with the appropriate state, and that
    appropriate state can be set up with stack only overwriting.

    Period.

    Peter
  25. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Yousuf Khan <bbbl67@ezrs.com> wrote:
    > Sander Vesik <sander@haldjas.folklore.ee> wrote:
    >>> and data segment are separate. It's like as if the stack had its own
    >>> container, the code has its own, and the data heap its own. What
    >>> happens in one container won't even reach the other containers.
    >>
    >> Doesn't matter. All you need for an exploit is to be able to make
    >> *one* system call. And for that, you don't need to write to the code
    >> segment at all. The stack is enough.
    >
    > The only place you can run code is from the code segment. If you insert code
    > into the stack segment, none of it will be executable. At best it might end
    > up causing the return address to go to the wrong part of the code segment
    > and therefore run the program from the wrong point, but more likely the
    > program will just end up locking up and be shutdown by the OS.
    >
    > Yousuf Khan

    Yousuf,

    Check out the following link:

    http://packetstormsecurity.nl/groups/horizon/stack.txt

    which explains how you can do overflow attack
    when stack is not executable.
    Although this is illustrated in Solaris/SPARC,
    it equally applies to any x86.

    Seongbae
  26. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    > Segments would've fully protected everything.

    Your assurance is endearing. But re-read the thread for a counter example
    where the only code executed (in this process anyway) already exists (it
    just forks off a /bin/sh shell).

    Segments protect just as "fully" as separate address spaces do.
    It's better than nothing, but unless you're extremely careful, it's not
    sufficient for real security. Better make sure buffer overflows *can't*
    happen, so you can actually reason about properties of your code.


    Stefan
  27. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Grumble <a@b.c> wrote:
    > You don't call any other function in your recursive functions? :-)

    Hey, I avoid recursion. But if you called another fn,
    it too would return.

    > As far as I can tell, and with the exception of recursive
    > functions which call no other function, RAS overflow will
    > cause a RET misprediction.

    It should case a RET misprediction even then unless it duplicates
    TOS when it pops. For use as a security mechanism, it'd be
    better if TOS was tagged empty or missing. Then no MCE.

    -- Robert

    >
  28. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    > Exception handling is easy -- mismatch produces a MC interrupt.
    > The kernelspace ISR checks the MSRs which tell it that a return
    > addr mismatch occurred. Kenel decides what to do -- abort proc,
    > log, or proceed.

    And how does the kernel "decide what to do"?
    It's so simple to prevent buffer overflows, there's really no reason to go
    to the trouble of some special hardware mechanism to catch some "odd"
    behavior which may sometimes catch some forms of buffer-overflow-exploits.

    > Sure it'll be slow, but how often are calls not paired with returns?

    Can be pretty frequent with some languages/compilers, although admittedly
    the cost of the misprediction you get with current CPUs is a strong
    incentive to try and avoid such situations.


    Stefan
  29. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    >> You don't call any other function in your recursive functions? :-)
    > Hey, I avoid recursion.

    Too bad. Usually makes for clean and simple code, whose security is
    simpler to verify.


    Stefan
  30. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    > And how does the kernel "decide what to do"?

    Whatever it's been programmed to do, likely on a per-process basis.
    Likely it'd start APatchy with something like
    # /usr/sbin/nooverflow httpd &

    > It's so simple to prevent buffer overflows, there's really no reason to go

    Simple? Writing good code is simple? Wading through millions
    of lines of cruft is simple?

    > to the trouble of some special hardware mechanism to catch some "odd"
    > behavior which may sometimes catch some forms of buffer-overflow-exploits.

    I don't think there are that many forms of buffer overflows.
    All result from an open ended IO call like gets().
    Do you know any other kinds?

    > Can be pretty frequent with some languages/compilers, although admittedly

    Which ones? mispairing call/ret is a fast way to overflow the stack.

    -- Robert
  31. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    >> And how does the kernel "decide what to do"?

    > Whatever it's been programmed to do, likely on a per-process basis.
    > Likely it'd start APatchy with something like
    > # /usr/sbin/nooverflow httpd &

    Say a mismatch (it's not just overflows) happens in a program that uses
    exceptions (and where mismatches are hence not necessarily a sign of
    a buffer-overflow-exploit): how is the kernel to determine if a given
    mismatch is harmless?

    >> It's so simple to prevent buffer overflows, there's really no reason to go
    > Simple?

    Trivial: use a language where it's automatically enforced.
    I.e. basically any language other than C. Or use a C compiler that goes
    through the extra trouble of trying to prevent overflow exploits
    (i.e. by allocating stack variables on a separate stack, or by using fat
    pointers, or ...).

    >> to the trouble of some special hardware mechanism to catch some "odd"
    >> behavior which may sometimes catch some forms of buffer-overflow-exploits.
    > I don't think there are that many forms of buffer overflows.

    Maybe not, but they can happen in many different kinds of code and there can
    be many forms of exploits. So it can be between very difficult and
    impossible for a low-level system to determine if a given behavior is part
    of the normal execution or is the sign of an exploit.


    Stefan
  32. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    > Say a mismatch (it's not just overflows) happens in a program that uses
    > exceptions (and where mismatches are hence not necessarily a sign of
    > a buffer-overflow-exploit): how is the kernel to determine if a given
    > mismatch is harmless?

    Well if the pgm has designed-in mismatches, the kernel can't
    determine it, and the the pgm would have to be run with that
    protection disabled. But how many languages (other than asm)
    even _allow_ mismatched call/ret?

    > Maybe not, but they can happen in many different kinds of code and there can
    > be many forms of exploits. So it can be between very difficult and
    > impossible for a low-level system to determine if a given behavior is part
    > of the normal execution or is the sign of an exploit.

    Well, actually there is another way. The OS could monitor
    events like return adress mismatches and take defensive
    actions when an increase is noted.

    -- Robert
  33. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Yousuf Khan <bbbl67@ezrs.com> wrote:
    > Sander Vesik <sander@haldjas.folklore.ee> wrote:
    > >> and data segment are separate. It's like as if the stack had its own
    > >> container, the code has its own, and the data heap its own. What
    > >> happens in one container won't even reach the other containers.
    > >
    > > Doesn't matter. All you need for an exploit is to be able to make
    > > *one* system call. And for that, you don't need to write to the code
    > > segment at all. The stack is enough.
    >
    > The only place you can run code is from the code segment. If you insert code

    only superficialy true. as you have control of the stack, you can cause any
    number of function calls to happen with the parameters of your choice. This
    is essentialy the same as running code.

    > into the stack segment, none of it will be executable. At best it might end
    > up causing the return address to go to the wrong part of the code segment
    > and therefore run the program from the wrong point, but more likely the
    > program will just end up locking up and be shutdown by the OS.

    Only if you don't know the addresses of functions and system calls.

    >
    > Yousuf Khan
    >
    >

    --
    Sander

    +++ Out of cheese error +++
  34. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    > > And how does the kernel "decide what to do"?
    >
    > Whatever it's been programmed to do, likely on a per-process basis.
    > Likely it'd start APatchy with something like
    > # /usr/sbin/nooverflow httpd &
    >
    > > It's so simple to prevent buffer overflows, there's really no reason to go
    >
    > Simple? Writing good code is simple? Wading through millions
    > of lines of cruft is simple?
    >
    > > to the trouble of some special hardware mechanism to catch some "odd"
    > > behavior which may sometimes catch some forms of buffer-overflow-exploits.
    >
    > I don't think there are that many forms of buffer overflows.
    > All result from an open ended IO call like gets().
    > Do you know any other kinds?

    Yes. Accepting user provided content length and not checking it against
    your buffer size.

    >
    > > Can be pretty frequent with some languages/compilers, although admittedly
    >
    > Which ones? mispairing call/ret is a fast way to overflow the stack.
    >
    > -- Robert
    >

    --
    Sander

    +++ Out of cheese error +++
  35. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    > > Say a mismatch (it's not just overflows) happens in a program that uses
    > > exceptions (and where mismatches are hence not necessarily a sign of
    > > a buffer-overflow-exploit): how is the kernel to determine if a given
    > > mismatch is harmless?
    >
    > Well if the pgm has designed-in mismatches, the kernel can't
    > determine it, and the the pgm would have to be run with that
    > protection disabled. But how many languages (other than asm)
    > even _allow_ mismatched call/ret?

    Consider a user mode threads package that uses get/setcontext()
    or setjmp / longjmp and so on.

    >
    > -- Robert
    >

    --
    Sander

    +++ Out of cheese error +++
  36. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Sander Vesik <sander@haldjas.folklore.ee> wrote:
    > Consider a user mode threads package that uses
    > get/setcontext() or setjmp / longjmp and so on.

    Well, I'm not entirely sure how these constructs are
    implemented by the compilers, but I would expect a
    simple `jmp` instruction. This does NOT disturb the
    hw call/ret stack, nor pose any buffer-overflow danger.

    -- Robert
  37. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    >> Say a mismatch (it's not just overflows) happens in a program that uses
    >> exceptions (and where mismatches are hence not necessarily a sign of
    >> a buffer-overflow-exploit): how is the kernel to determine if a given
    >> mismatch is harmless?

    > Well if the pgm has designed-in mismatches, the kernel can't
    > determine it, and the the pgm would have to be run with that
    > protection disabled. But how many languages (other than asm)
    > even _allow_ mismatched call/ret?

    Any language with exceptions: C++, Java, C (with setjmp/longjmp), ...

    >> Maybe not, but they can happen in many different kinds of code and there can
    >> be many forms of exploits. So it can be between very difficult and
    >> impossible for a low-level system to determine if a given behavior is part
    >> of the normal execution or is the sign of an exploit.

    > Well, actually there is another way. The OS could monitor
    > events like return adress mismatches and take defensive
    > actions when an increase is noted.

    A buffer-overflow exploit might only need one mismatch.


    Stefan
  38. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    > Any language with exceptions: C++, Java, C (with setjmp/longjmp), ...

    Why should exceptions change anything? AFAIK, all exceptions are
    kernel events / interrupts wherein the previous context is fully
    saved and restored. Userspace exception handlers are supposed
    to be isolated code with their own returns.

    -- Robert
  39. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Robert Redelmeier <redelm@ev1.net.invalid> wrote in message news:<0zltc.804$Fg5.576@newssvr23.news.prodigy.com>...
    > ...
    >
    > But `c` compilers have this habit of allocating local variable
    > space on the stack. So when `char input[80];` is coded in a
    > routine, ESP gets decreased by 80 and that array is sitting
    > just below the return address!
    >
    > I don't think it's _required_ by any standard that local vars are
    > allocated on the stack, but it sure makes memory managment easy.

    It also facilitates recursion and re-entrancy. But it needn't be the
    same stack as the return linkage pointer.

    > AFAIK, only global vars and large malloc()s are put on the heap.

    Only malloc()s.

    Toby

    >
    > -- Robert
  40. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In article <srLtc.1051$pF5.486@newssvr23.news.prodigy.com>,
    Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    >In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    >> Any language with exceptions: C++, Java, C (with setjmp/longjmp), ...
    >
    >Why should exceptions change anything? AFAIK, all exceptions are
    >kernel events / interrupts wherein the previous context is fully
    >saved and restored. Userspace exception handlers are supposed
    >to be isolated code with their own returns.

    Boggle.

    It ain't what you don't know that causes the trouble; it's what you
    know that ain't so.


    Regards,
    Nick Maclaren.
  41. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Sander Vesik <sander@haldjas.folklore.ee> wrote:
    >> The only place you can run code is from the code segment. If you
    >> insert code
    >
    > only superficialy true. as you have control of the stack, you can
    > cause any number of function calls to happen with the parameters of
    > your choice. This is essentialy the same as running code.

    I see, so how long has C been passing command-line parameters through the
    stack? How many other languages do this?

    Yousuf Khan
  42. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In article <d6ce4a6c.0405290255.5f77d483@posting.google.com>,
    Toby Thain <toby@telegraphics.com.au> wrote:
    >Robert Redelmeier <redelm@ev1.net.invalid> wrote in message news:<0zltc.804$Fg5.576@newssvr23.news.prodigy.com>...
    >>
    >> But `c` compilers have this habit of allocating local variable
    >> space on the stack. So when `char input[80];` is coded in a
    >> routine, ESP gets decreased by 80 and that array is sitting
    >> just below the return address!
    >>
    >> I don't think it's _required_ by any standard that local vars are
    >> allocated on the stack, but it sure makes memory managment easy.
    >
    >It also facilitates recursion and re-entrancy. But it needn't be the
    >same stack as the return linkage pointer.

    That is true.

    >> AFAIK, only global vars and large malloc()s are put on the heap.
    >
    >Only malloc()s.

    That isn't. It depends on the implementation where variably sized
    arrays are put, for example.


    Regards,
    Nick Maclaren.
  43. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In article <oJ1uc.7840$JmE.5318@news04.bloor.is.net.cable.rogers.com>,
    Yousuf Khan <bbbl67@ezrs.com> wrote:
    >Sander Vesik <sander@haldjas.folklore.ee> wrote:
    >>> The only place you can run code is from the code segment. If you
    >>> insert code
    >>
    >> only superficialy true. as you have control of the stack, you can
    >> cause any number of function calls to happen with the parameters of
    >> your choice. This is essentialy the same as running code.
    >
    >I see, so how long has C been passing command-line parameters through the
    >stack? How many other languages do this?

    Sinve the beginning. In pretty well all stack-based languages,
    you can emulate such a call with no hassle. In some, it is more
    difficult.


    Regards,
    Nick Maclaren.
  44. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    On Wed, 26 May 2004, Yousuf Khan wrote:

    > No, for not opting to use both. There was no mutual exclusivity between
    > paging and segmentation. Both could be used and complement each other.

    Google for ingo molnar, execshield, ascii armou?r.

    -Peter
  45. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Toby Thain <toby@telegraphics.com.au> wrote:
    +---------------
    | Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    | > I don't think it's _required_ by any standard that local vars are
    | > allocated on the stack, but it sure makes memory managment easy.
    |
    | It also facilitates recursion and re-entrancy. But it needn't be the
    | same stack as the return linkage pointer.
    +---------------

    But if you *don't* do it, then you have trouble with stack fragmentation
    and/or collisions with your "argument stack" expanding at a different rate
    than your "linkage stack", resulting in one or the other bumping into
    arbitrary limits at inconvenient times. As a result, one or the other
    of the stacks gets pushed off into the heap (usually the argument stack)
    as a linked list of stack-allocated "malloc()" blocks [optimized by
    allocating a bunch at a time], which puts a lot of stress on "malloc()",
    or gets pushed into a separately-managed segment of address space, which
    puts pressure on memory allocation in general and the dynamic loader in
    particular.

    We had some of these issues with the Am29000 Subroutine Calling Standard
    (circa 1987), which had both a "register cache" stack for linkage
    information and "small" arguments (which were passed in registers)
    and a "memory" stack for "large" arguments (as well as *any* argument,
    regardless of size, that the called subroutine referenced by address).[1]
    Had the 29k CPU family ever made it into the 32-bit Unix[2] workstation
    market, where as we know address space layout has become an issue
    (especially with an ever-larger number of DLLs or DSOs competing for space),
    the two-stack calling sequence could have become quite problematic.
    [As it was, in the embedded-processor space it was pretty much a non-issue.]


    -Rob

    [1] Actually, the rule was that the first 16 *words* of arguments got
    passed in registers and any further words of arguments got passed
    on the memory stack, except that if the called routine referenced
    any of the first 16 words by address (e.g., "&foo") then that word
    and all subsequence words of the register args would get copied into
    the memory stack at subroutine entry. Yes, this meant that whenever
    the memory stack got used at all there was a 64-byte area at the
    front reserved in case the first 16 words needed to be manifested
    in memory. (*Ugh*)

    [2] Both BSD and System-V ports were done to the Am29000 -- both were
    quite straightforward since the 29k was a friendly target enviroment --
    but shortly after both were up & running AMD chose not to promote
    the 29k as a Unix engine, and they were abandoned.

    -----
    Rob Warnock <rpw3@rpw3.org>
    627 26th Avenue <URL:http://rpw3.org/>
    San Mateo, CA 94403 (650)572-2607
  46. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    Peter "Firefly" Lund <firefly@diku.dk> wrote:
    > On Wed, 26 May 2004, Yousuf Khan wrote:
    >
    >> No, for not opting to use both. There was no mutual exclusivity
    >> between paging and segmentation. Both could be used and complement
    >> each other.
    >
    > Google for ingo molnar, execshield, ascii armou?r.

    Looks like he was using the segment limits to protect against stack
    overflows.

    Yousuf Khan
  47. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    On Sat, 29 May 2004 15:10:12 GMT, "Yousuf Khan" <bbbl67@ezrs.com> wrote:

    >Sander Vesik <sander@haldjas.folklore.ee> wrote:
    >>> The only place you can run code is from the code segment. If you
    >>> insert code
    >>
    >> only superficialy true. as you have control of the stack, you can
    >> cause any number of function calls to happen with the parameters of
    >> your choice. This is essentialy the same as running code.
    >
    >I see, so how long has C been passing command-line parameters through the
    >stack? How many other languages do this?

    Hmm, not "command line" parameters but "actual argument" passing to
    functions is done using the stack in C and most Fortrans I've come
    across... if the hardware has a stack. But it's not the arguments, which
    are pushed on the stack by the programmer written caller routine, which are
    important - AIUI it's the return address which gets pushed on the stack
    automatically by the "call" instruction. That's what can be fudged by the
    exploit - all you need is a bugged, err vulnerable, system call address to
    plonk in there which, when entered, will also take *its* argument values
    off the stack.

    Rgds, George Macdonald

    "Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
  48. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    > > Any language with exceptions: C++, Java, C (with setjmp/longjmp), ...
    >
    > Why should exceptions change anything? AFAIK, all exceptions are
    > kernel events / interrupts wherein the previous context is fully
    > saved and restored. Userspace exception handlers are supposed
    > to be isolated code with their own returns.

    So you have no idea at all what a exception is in C++ / Java ? If so
    why are you arguing on this topic? You are only going to be completely
    wrong and embarass yourself.

    >
    > -- Robert
    >

    --
    Sander

    +++ Out of cheese error +++
  49. Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

    In comp.arch Robert Redelmeier <redelm@ev1.net.invalid> wrote:
    > In comp.sys.ibm.pc.hardware.chips Sander Vesik <sander@haldjas.folklore.ee> wrote:
    > > Consider a user mode threads package that uses
    > > get/setcontext() or setjmp / longjmp and so on.
    >
    > Well, I'm not entirely sure how these constructs are
    > implemented by the compilers, but I would expect a
    > simple `jmp` instruction. This does NOT disturb the
    > hw call/ret stack, nor pose any buffer-overflow danger.

    Completely, *utterly* wrong. Even if the essence of teh longjmp
    is a 'jmp' instruction, teh whole point of it is that it happens
    back over a number of intermediate call frames that thus never
    return. The case of get/setcontext is even more drastic - these
    actively change the userland context, so that after the setcontext
    call the stack is pointing to a completely new place, including a
    new call/return history that has no relation to previous thread's
    call/return history.

    Quit arguing about things you don't know anything about.

    >
    > -- Robert
    >

    --
    Sander

    +++ Out of cheese error +++
Ask a new question

Read More

CPUs Hardware Intel