compiling to z-machine code ?

Archived from groups: rec.games.int-fiction (More info?)

Is there a somewhat easy to understand spec of the z-machine? Just for
fun, I am interested in writing a simple compiler that can produce
basic z-machine runnable code (ZIP? ZIL? i forget the terminology) with
commands like variable assignment (string/integer), printing to the
screen, asking the user for input, if/then, etc. Any pointers
appreciated...
45 answers Last reply
More about compiling machine code
  1. Archived from groups: rec.games.int-fiction (More info?)

    On that note: has anyone ever done a program that could take a story file
    and generate a basic program (or any other language) from it?
  2. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/5/2005 6:58 PM, Mad Scientist Jr did proclaim:
    > Is there a somewhat easy to understand spec of the z-machine? Just for
    > fun, I am interested in writing a simple compiler that can produce
    > basic z-machine runnable code (ZIP? ZIL? i forget the terminology) with
    > commands like variable assignment (string/integer), printing to the
    > screen, asking the user for input, if/then, etc. Any pointers
    > appreciated...

    First of all, there's the Inform compiler, which produces either Z-code
    or Glulx (which is, sort-of, kind-of, a 32-bit version of Z-code,
    although its author would disagree). You can find it here:
    http://www.inform-fiction.org/

    Given that, if you still want to write your own compiler then you need
    "The Z-Machine Standards Document" found at:
    http://www.inform-fiction.org/zmachine/standards/index.html
  3. Archived from groups: rec.games.int-fiction (More info?)

    Amir Karger has written a Perl module that translates Zcode files to
    Perl:

    http://ifarchive.jmac.org/if-archive/infocom/tools/Language-Zcode-0.8.tar.gz

    I believe it's built so that you should be able to translate Zcode into
    languages other than Perl too, if you're willing to do some coding of
    course.
  4. Archived from groups: rec.games.int-fiction (More info?)

    >> On that note: has anyone ever done a program that could take
    >> a story file and generate a basic program (or any other
    >> language) from it?
    > Not off the top of my head. Back in '97-'98, there was a
    > big effort to decypher storyfiles, as part of the project
    > to figure out how storyfiles were put together and exactly
    > what all of the Z-machine opcodes were supposed to do.
    > All of the Ztools were created at that time; you can
    > download them here:

    Inform 1 was released to the public on 10 May 1993, and I recall Graham
    Nelson claiming that the didn't decode the Zcode story file format
    himself - others had already done that before him. Your acclaimed date
    is probably wrong by at least five years.
  5. Archived from groups: rec.games.int-fiction (More info?)

    >> On that note: has anyone ever done a program that could take
    >> a story file and generate a basic program (or any other
    >> language) from it?
    > Not off the top of my head. Back in '97-'98, there was a
    > big effort to decypher storyfiles, as part of the project
    > to figure out how storyfiles were put together and exactly
    > what all of the Z-machine opcodes were supposed to do.
    > All of the Ztools were created at that time; you can
    > download them here:


    Inform 1 was released to the public on 10 May 1993, and I recall Graham
    Nelson claiming that the didn't decode the Zcode story file format
    himself - others had already done that before him.
  6. Archived from groups: rec.games.int-fiction (More info?)

    Not that I am in a great hurry to do this, but I can't imagine it NOT
    being possible to disassemble z-code. Are you saying that due to this
    compressed format, any meaningful variable or function/procedure names
    would be lost, and instead you would be left with A,B,C, functionA,
    functionB, functionC, etc.? Because if the z-machine can run it, a
    z-disassembler ought to be able to render source code from it (albeit
    without friendly var/function names).

    >You are especially out of luck if you want a file that you can modify
    and re-compile, since storyfiles use a compressed format to store
    pointers to objects, strings and procedures.
  7. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/5/2005 9:25 PM, Terry Olsen did proclaim:

    > On that note: has anyone ever done a program that could take a story file
    > and generate a basic program (or any other language) from it?

    Not off the top of my head. Back in '97-'98, there was a big effort to
    decypher storyfiles, as part of the project to figure out how storyfiles
    were put together and exactly what all of the Z-machine opcodes were
    supposed to do. All of the Ztools were created at that time; you can
    download them here:

    http://www.inform-fiction.org/zmachine/ztools.html

    InfoDump will display all of the strings in a storyfile, while txd will
    give you just about every byte formatted for easy viewing. Nothing will
    look at a procedure and create high-level code; the best you can ever
    hope for is Inform assembler op-codes. You are especially out of luck
    if you want a file that you can modify and re-compile, since storyfiles
    use a compressed format to store pointers to objects, strings and
    procedures. This means that deciding which variables and object
    properties are pointers to objects and which are just numbers would be
    especially uncertain. Any changes to the file that caused things to
    move would invalidate any pointers that had been misidentified as
    numbers (and vice versa), causing interesting errors at run time.
  8. Archived from groups: rec.games.int-fiction (More info?)

    okay i think i understand that explanation. thanks to everyone who has
    posted. i read some articles by the infocom guys that showed some
    Muddle (aka MDL, the LISP like language they used) source code and it's
    ugly! Did they ever make available any of their complete source codes
    anywhere?

    >The interpreter doesn't have to distinguish a value's meaning until it

    actually get used by an instruction. When the interpreter sees the
    opcode to print local variable 1, it can assume that the variable
    contains the address of a string. (And if it doesn't, it can signal a
    run-time error.) But the value may have been stored in that variable
    at
    a completely different time.
  9. Archived from groups: rec.games.int-fiction (More info?)

    Here, Mad Scientist Jr <usenet_daughter@yahoo.com> wrote:
    > Not that I am in a great hurry to do this, but I can't imagine it NOT
    > being possible to disassemble z-code. Are you saying that due to this
    > compressed format, any meaningful variable or function/procedure names
    > would be lost, and instead you would be left with A,B,C, functionA,
    > functionB, functionC, etc.?

    You have that problem, yes.

    A bigger problem is distinguishing addresses from constants. If you
    see a z-code opcode "load $4A8E into local variable 1", you have to
    figure out that $4A8E actually refers to a string at that address,
    rather than a numeric constant. This is important for a human reading
    the code. It's even more important if you plan to modify the code and
    recompile it, because in the new binary that string may be at a
    different address. If you guess wrong about what that instance of
    $4A8E means, you'll either print garbage or change an important
    numeric constant in the program.

    You can use some reasonable heuristics. For example, if local variable
    1 is immediately printed, it's a very good bet that $4A8E is a string
    address. If there *is* a string with address $4A8E, then again it's a
    fair bet. But none of these heuristics is perfectly reliable, so any
    true disassembling effort will require some going-over with human eyes.

    (If that problem seems easy, then consider property identifiers. These
    have the same difficulty, but they're small numbers like 4 or 7, which
    -- unlike $4A8E -- are as likely to be *intended* as numbers as they
    are to be property ids.)

    --Z

    "And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
    *
    I'm still thinking about what to put in this space.
  10. Archived from groups: rec.games.int-fiction (More info?)

    "Andrew Plotkin" <erkyrath@eblong.com> wrote in message
    news:cu5knp$fu$2@reader2.panix.com...
    > Here, Mad Scientist Jr <usenet_daughter@yahoo.com> wrote:
    >> Not that I am in a great hurry to do this, but I can't imagine it NOT
    >> being possible to disassemble z-code. Are you saying that due to this
    >> compressed format, any meaningful variable or function/procedure names
    >> would be lost, and instead you would be left with A,B,C, functionA,
    >> functionB, functionC, etc.?
    >
    > You have that problem, yes.
    >
    > A bigger problem is distinguishing addresses from constants. If you
    > see a z-code opcode "load $4A8E into local variable 1", you have to
    > figure out that $4A8E actually refers to a string at that address,
    > rather than a numeric constant. This is important for a human reading
    > the code. It's even more important if you plan to modify the code and
    > recompile it, because in the new binary that string may be at a
    > different address. If you guess wrong about what that instance of
    > $4A8E means, you'll either print garbage or change an important
    > numeric constant in the program.

    >
    > You can use some reasonable heuristics. For example, if local variable
    > 1 is immediately printed, it's a very good bet that $4A8E is a string
    > address. If there *is* a string with address $4A8E, then again it's a
    > fair bet. But none of these heuristics is perfectly reliable, so any
    > true disassembling effort will require some going-over with human eyes.

    Ok, so a big problem is distinguishing addresses from constants. How then
    does an interpreter distinguish them? A disassembler can see just as much of
    the story file as an interpreter. So why wouldn't it be able to distinguish
    the cases from each other? If an interpreter can do it, so can a
    disassembler.

    > (If that problem seems easy, then consider property identifiers. These
    > have the same difficulty, but they're small numbers like 4 or 7, which
    > -- unlike $4A8E -- are as likely to be *intended* as numbers as they
    > are to be property ids.)

    And again, same reasoning. If an interpreter can distinguish them, then why
    couldn't a disassembler?
  11. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/6/2005 12:00 PM, Rioshin an'Harthen did proclaim:
    > "Andrew Plotkin" <erkyrath@eblong.com> wrote in message
    > news:cu5knp$fu$2@reader2.panix.com...
    >>You can use some reasonable heuristics. For example, if local variable
    >>1 is immediately printed, it's a very good bet that $4A8E is a string
    >>address. If there *is* a string with address $4A8E, then again it's a
    >>fair bet. But none of these heuristics is perfectly reliable, so any
    >>true disassembling effort will require some going-over with human eyes.
    >
    > Ok, so a big problem is distinguishing addresses from constants. How then
    > does an interpreter distinguish them? A disassembler can see just as much of
    > the story file as an interpreter. So why wouldn't it be able to distinguish
    > the cases from each other? If an interpreter can do it, so can a
    > disassembler.

    The interpreter doesn't have to distinguish a value's meaning until it
    actually get used by an instruction. When the interpreter sees the
    opcode to print local variable 1, it can assume that the variable
    contains the address of a string. (And if it doesn't, it can signal a
    run-time error.) But the value may have been stored in that variable at
    a completely different time. Also note that if the variable was
    originally named 'temp', then it will likely have different types of
    values at different times, which can futher bolix up the disassembler.
    And static analysis of the entire program doesn't help since there might
    be dead code and/or undiscovered bugs. Just because there's an
    instruction somewhere to print local variable 1 doesn't mean that it
    will ever be executed.

    BTW, Infocom used a LISP-based language that is very different from
    Inform. There are storyfiles (Seastalker comes immediately to mind)
    that the Inform compiler would be unable to duplicate, because it won't
    let you put procedures and strings in readable memory.
  12. Archived from groups: rec.games.int-fiction (More info?)

    samwyse wrote:
    > A really interesting (though very short) discussion of ZIL (Zork
    > Implementation Language) can be found here:
    >
    > http://lambda-the-ultimate.org/node/view/429
    >
    > It indicates that Activision *does* have the source code for both
    > ZIL and all of the Infocom games. "A lot of people might not
    > realize it, but as late as 1993, Activision's "Return to Zork"
    > was still written in a (heavily extended) version of ZIL, i.e. a
    > dialect of Lisp."

    Having been a beta-tester for "Return to Zork", I would have to
    disagree with the quote from this page... RtZ was written in a language
    that _looked_ Lispy, but it had zero to do with MDL or ZIL... no common
    code, no common authorship, etc.

    -ethan
  13. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
    > okay i think i understand that explanation. thanks to everyone who has
    > posted. i read some articles by the infocom guys that showed some
    > Muddle (aka MDL, the LISP like language they used) source code and it's
    > ugly! Did they ever make available any of their complete source codes
    > anywhere?

    LISP, APL and Forth are all interpreted languages that are incredibly
    powerful, but which have incredibly steep learning curves. LISP-like
    languages are particularly suited for natural language applications,
    which is why they are used so much in Artificial Intellegence and were
    judged well suited for interactive fiction.

    The excerpts that you've seen published are pretty much all that there
    is, unless Activision has some code hidden away in a vault somewhere.
    The circumstantial evidence is pretty strong, however, that all they
    have are the binaries, the same as us. Someone posted last year that
    they were going to try to recreate the Infocom compiler; nothing has
    been heard on the subject since then.
  14. Archived from groups: rec.games.int-fiction (More info?)

    Here, samwyse <dejanews@email.com> wrote:
    > On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
    > > okay i think i understand that explanation. thanks to everyone who has
    > > posted. i read some articles by the infocom guys that showed some
    > > Muddle (aka MDL, the LISP like language they used) source code and it's
    > > ugly! Did they ever make available any of their complete source codes
    > > anywhere?
    >
    > LISP, APL and Forth are all interpreted languages that are incredibly
    > powerful, but which have incredibly steep learning curves. LISP-like
    > languages are particularly suited for natural language applications,
    > which is why they are used so much in Artificial Intellegence and were
    > judged well suited for interactive fiction.

    The bits of Infocom code we have are not LISP. They're in a compiled
    (not interpreted) language, which is imperative in form (not
    functional or dynamic). It's a lot like early Inform, which is not a
    surprise. Only the syntax is LISP-like.

    --Z

    "And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
    *
    I'm still thinking about what to put in this space.
  15. Archived from groups: rec.games.int-fiction (More info?)

    Here, Rioshin an'Harthen <rioshin@breakthru.com> wrote:
    > "Andrew Plotkin" <erkyrath@eblong.com> wrote in message
    > news:cu5knp$fu$2@reader2.panix.com...
    > >
    > > A bigger problem is distinguishing addresses from constants. If you
    > > see a z-code opcode "load $4A8E into local variable 1", you have to
    > > figure out that $4A8E actually refers to a string at that address,
    > > rather than a numeric constant. This is important for a human reading
    > > the code. It's even more important if you plan to modify the code and
    > > recompile it, because in the new binary that string may be at a
    > > different address. If you guess wrong about what that instance of
    > > $4A8E means, you'll either print garbage or change an important
    > > numeric constant in the program.
    >
    > Ok, so a big problem is distinguishing addresses from constants. How then
    > does an interpreter distinguish them?

    The interpreter doesn't distinguish them. It just passes the value
    through blindly, and assumes that the right thing will happen.

    Note that I specified the tasks of "making human-readable code" and
    "modifying and recompiling code". Those are tasks which an interpreter
    doesn't have to do. They are tasks which the Z-code format is not
    designed to make easy. Unsurprisingly, therefore, they're *not* easy.

    --Z

    "And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
    *
    I'm still thinking about what to put in this space.
  16. Archived from groups: rec.games.int-fiction (More info?)

    In article <cu5knp$fu$2@reader2.panix.com>,
    Andrew Plotkin <erkyrath@eblong.com> wrote:
    >A bigger problem is distinguishing addresses from constants. If you
    >see a z-code opcode "load $4A8E into local variable 1", you have to
    >figure out that $4A8E actually refers to a string at that address,
    >rather than a numeric constant. This is important for a human reading
    >the code. It's even more important if you plan to modify the code and
    >recompile it, because in the new binary that string may be at a
    >different address. If you guess wrong about what that instance of
    >$4A8E means, you'll either print garbage or change an important
    >numeric constant in the program.

    My knowledge of decompilers has deteriorated a lot in the last 25 years, but
    I'm fairly sure this is a "standard problem" solved by some variant of type
    inference or some other form of dynamic code analysis. I think there are
    similar issues with byte-code-to-native compilers. I'm not saying it's easy,
    just that there is literature in the compiler community that addresses similar problems.

    >You can use some reasonable heuristics. For example, if local variable
    >1 is immediately printed, it's a very good bet that $4A8E is a string
    >address. If there *is* a string with address $4A8E, then again it's a
    >fair bet. But none of these heuristics is perfectly reliable, so any
    >true disassembling effort will require some going-over with human eyes.

    IIRC the above-mentioned amount to formalizing these kind of heuristics and
    verifying that the code really does treat the value that way.

    >(If that problem seems easy, then consider property identifiers. These
    >have the same difficulty, but they're small numbers like 4 or 7, which
    >-- unlike $4A8E -- are as likely to be *intended* as numbers as they
    >are to be property ids.)

    The interesting trick is if some clever compiler treats them as property
    identifiers in one place and simple integers in another. I have vague
    memories of some peephole optimizers generating such multi-use situations.
    --
    "Yo' ideas need to be thinked befo' they are say'd" - Ian Lamb, age 3.5
    http://www.cs.queensu.ca/~dalamb/ qucis->cs to reply (it's a long story...)
  17. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/6/2005 4:06 PM, Andrew Plotkin did proclaim:
    > Here, samwyse <dejanews@email.com> wrote:
    >
    >>On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
    >>
    >>>okay i think i understand that explanation. thanks to everyone who has
    >>>posted. i read some articles by the infocom guys that showed some
    >>>Muddle (aka MDL, the LISP like language they used) source code and it's
    >>>ugly! Did they ever make available any of their complete source codes
    >>>anywhere?
    >>
    >>LISP, APL and Forth are all interpreted languages that are incredibly
    >>powerful, but which have incredibly steep learning curves. LISP-like
    >>languages are particularly suited for natural language applications,
    >>which is why they are used so much in Artificial Intellegence and were
    >>judged well suited for interactive fiction.
    >
    > The bits of Infocom code we have are not LISP. They're in a compiled
    > (not interpreted) language, which is imperative in form (not
    > functional or dynamic). It's a lot like early Inform, which is not a
    > surprise. Only the syntax is LISP-like.

    Quite so, but I was mostly discussing why a language was chosen that was
    "ugly". I've used a few variants of LISP, including (in reverse order)
    CommonLisp, Scheme and an IBM mainframe version that was probably a
    close decendent of McCarthy's original implementation. A really
    interesting (though very short) discussion of ZIL (Zork Implementation
    Language) can be found here:

    http://lambda-the-ultimate.org/node/view/429

    It indicates that Activision *does* have the source code for both ZIL
    and all of the Infocom games. "A lot of people might not realize it,
    but as late as 1993, Activision's "Return to Zork" was still written in
    a (heavily extended) version of ZIL, i.e. a dialect of Lisp."
  18. Archived from groups: rec.games.int-fiction (More info?)

    On 2005-02-06 09:21:52 -0800, "Mad Scientist Jr"
    <usenet_daughter@yahoo.com> said:

    > Are you saying that due to this
    > compressed format, any meaningful variable or function/procedure names
    > would be lost, and instead you would be left with A,B,C, functionA,
    > functionB, functionC, etc.? Because if the z-machine can run it, a
    > z-disassembler ought to be able to render source code from it (albeit
    > without friendly var/function names).

    No, but many things are referred to as simply numbers, and you can't
    tell what those numbers mean unless they're being used in their
    functions.

    For example, suppose that I have the following code:

    [Foo direction;
    print "That way is ", (name)location.direction;
    ];
    [Bar obj;
    print "That is ", (the)obj;
    ];
    [Baz verb;
    if (verb == ##Jump)
    "You jump.";
    "You don't jump.";
    ];
    [Quux;
    Foo(n_to);
    Bar(platypus);
    Baz(##Climb);
    ];

    Now, in Quux, you're passing a number to Foo, a number to Bar, and a
    number to Baz. The compiler is kind enough to let you use n_to (a
    property name), platypus (an object identifier), and ##Climb (a verb
    identifier) to refer to their numbers, but in the compiled code,
    they're just numbers-- and they overlap. (In fact, the standard Inform
    library relies on the fact that n_to and n_obj have the same number--
    something I found out, much to my dismay, when I violated this just
    before putting my program in beta.) A disassembler examining Quux
    wouldn't know which is which, that is, it wouldn't know if the number
    for n_to was meant to represent a property, object, verb, etc.

    "But can't it look at Foo, Bar, and Baz?" Well, kind of. That would
    take a considerable amount of analysis. (It's not entirely impossible;
    Spidey, a Scheme analyzer, does some such analysis. But I suspect it's
    impossible to always perform such analysis). Also, the routines may
    not clearly define what type of object they're expecting. In the
    program I'm working on now, some subroutines can be passed dictionary
    words or objects, for example.

    There are decompilers for some languages, but they're also unreliable
    for similar reasons. It's more of a problem for Inform, since it
    relies on lots of types. Decompilers are most common for languages
    like C, which is almost a macro assembler anyway.

    Cheers,
    Piquan
  19. Archived from groups: rec.games.int-fiction (More info?)

    In article <1107693861.155522.259590@c13g2000cwb.googlegroups.com>,
    Fredrik Ramsberg <f.r@mail.com> wrote:
    >>> On that note: has anyone ever done a program that could take
    >>> a story file and generate a basic program (or any other
    >>> language) from it?
    >> Not off the top of my head. Back in '97-'98, there was a
    >> big effort to decypher storyfiles, as part of the project
    >> to figure out how storyfiles were put together and exactly
    >> what all of the Z-machine opcodes were supposed to do.
    >> All of the Ztools were created at that time; you can
    >> download them here:
    >
    >Inform 1 was released to the public on 10 May 1993, and I recall Graham
    >Nelson claiming that the didn't decode the Zcode story file format
    >himself - others had already done that before him. Your acclaimed date
    >is probably wrong by at least five years.

    The Ztools date back to 1992, and I believe they were the _second_
    generation of Z-machine story file tools.

    However, there was a later push to decode V6 and fill in
    some of the gaps.
  20. Archived from groups: rec.games.int-fiction (More info?)

    In article <1107710512.185343.281190@l41g2000cwc.googlegroups.com>,
    Mad Scientist Jr <usenet_daughter@yahoo.com> wrote:
    >Not that I am in a great hurry to do this, but I can't imagine it NOT
    >being possible to disassemble z-code. Are you saying that due to this
    >compressed format, any meaningful variable or function/procedure names
    >would be lost, and instead you would be left with A,B,C, functionA,
    >functionB, functionC, etc.? Because if the z-machine can run it, a
    >z-disassembler ought to be able to render source code from it (albeit
    >without friendly var/function names).

    Certainly Z-code can be disassembled. But at present, only to a
    low-level assembler. Nobody has made a totally successful decompiler,
    though there have been partially successful attempts.

    In theory, most Z-code shouldn't be all that hard to decompile. It
    comes down to a matter of bookkeeping; some manual help would be
    needed when information was totally lost and to fill in variable and
    function names, but not as much as you might think. In practice, no
    one has done it. It doesn't help that TXD isn't really suitable as
    the starting point for a decompiling effort, so you'd have to re-write
    the disassembler part before you got to the fun stuff.
  21. Archived from groups: rec.games.int-fiction (More info?)

    In article <cu666g$6jj$1@knot.queensu.ca>,
    David Alex Lamb <dalamb@qucis.queensu.ca> wrote:
    >
    >My knowledge of decompilers has deteriorated a lot in the last 25 years, but
    >I'm fairly sure this is a "standard problem" solved by some variant of type
    >inference or some other form of dynamic code analysis. I think there are
    >similar issues with byte-code-to-native compilers. I'm not saying it's easy,
    >just that there is literature in the compiler community that
    >addresses similar problems.

    Most recent byte-code to native work has been done on type-aware VMs
    like Java. This simplifies the task enormously.

    On the flip side, the Z-machine's paucity of types simplifies that
    side quite a bit.

    >IIRC the above-mentioned amount to formalizing these kind of heuristics and
    >verifying that the code really does treat the value that way.

    Right. This is what I call "bookkeeping". If you can figure out that
    parameter two of routine R_A562 must be a string (because it's used in a
    print_paddr), then you can figure out that anything passed in as the
    actual second parameter to R_A562 is a string also. You can work
    things like that forward and backwards, and come up with the types of
    a lot of things. With Infocom (but not Inform) code, you can also
    assume that global variables have the same type throughout the code,
    and that all properties of a given number have the same type.

    >The interesting trick is if some clever compiler treats them as property
    >identifiers in one place and simple integers in another. I have vague
    >memories of some peephole optimizers generating such multi-use situations.

    Fortunately, the ZIL compiler does not do this.
  22. Archived from groups: rec.games.int-fiction (More info?)

    Alas if I only had a slightly bigger brain. I would think the value in
    dcompiling z-code would mainly be to see how the masters wrote the
    classic Infocom games - more for learning and game design purposes. I'm
    sure a lot of that can be done from newer Inform games whose source is
    freely available, but I still would love to start by learning from the
    original Zork.

    > In theory, most Z-code shouldn't be all that hard to decompile. It
    > comes down to a matter of bookkeeping; some manual help would be
    > needed when information was totally lost and to fill in variable and
    > function names, but not as much as you might think. In practice, no
    > one has done it. It doesn't help that TXD isn't really suitable as
    > the starting point for a decompiling effort, so you'd have to
    re-write
    > the disassembler part before you got to the fun stuff.
  23. Archived from groups: rec.games.int-fiction (More info?)

    If I only had a slightly bigger brain (and more free time) I would try
    to write the decompiler myself. As it is, I looked at the z-machine
    spec and it's beyond my current knowledge and time... I once knew some
    8-bit assembly and my understanding of low level computing ends there.
    It's just a better use of my time to learn Inform and start writing
    games. My original hope was I could write a simple z-code compiler to
    translate my existing text games (which are no Zorks, I assure you) to
    z-code from their current homebrewed language (a sub BASIC scripting
    tongue at best). I think it best to learn Inform, and then if I'm
    *still* intersted in porting these ancient games of mine over for comic
    relief, it would be easier to write something to translate them to
    Inform code. I think it's best to just write new games and attempt
    something good. I begin reading the Inform manual tonight...
  24. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/6/2005 4:37 PM, David Alex Lamb did proclaim:
    > My knowledge of decompilers has deteriorated a lot in the last 25 years, but
    > I'm fairly sure this is a "standard problem" solved by some variant of type
    > inference or some other form of dynamic code analysis. I think there are
    > similar issues with byte-code-to-native compilers. I'm not saying it's easy,
    > just that there is literature in the compiler community that addresses similar problems.

    Quite so. Java is very strongly typed, and (to support the
    retrospection APIs) the class files keep a ton of data that would
    normally be of interest to only to a debugger. As a result, Java
    decompilers are a dime a dozen, because you can get by with a purely
    static analysis. The Z machine is the exact opposite of Java, looking
    more like an Motorola 6502. The architecture even seems to support
    self-modifying code, although no one seems to have ever tried writing any.

    Another problem is that there are multiple compilers. Turning a Java
    class file back into Java is aided by the fact that there's really only
    been a couple of versions of Java, and those have only very minor
    differences. The major versions of Inform are quite different from each
    other, there is a C compiler (AFAIK only used for "Silicon Castles"),
    and then there's ZIL, of which little is known but was used to create
    "the canon". Turing a storyfile compiled using Inform 4 into Inform 5
    source would be very difficult; turning something that started out as C
    or ZIL would, I suspect, be almost impossible.
  25. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/7/2005 2:34 AM, Joel Ray Holveck did proclaim:
    > (In fact, the standard Inform library
    > relies on the fact that n_to and n_obj have the same number-- something
    > I found out, much to my dismay, when I violated this just before putting
    > my program in beta.)

    I'm surprised to hear that. Where does that happen?
  26. Archived from groups: rec.games.int-fiction (More info?)

    On Mon, 07 Feb 2005 10:07:11 -0600, russotto@grace.speakeasy.net (Matthew
    Russotto) wrote:

    >Right. This is what I call "bookkeeping". If you can figure out that
    >parameter two of routine R_A562 must be a string (because it's used in a
    >print_paddr), then you can figure out that anything passed in as the
    >actual second parameter to R_A562 is a string also. You can work
    >things like that forward and backwards, and come up with the types of
    >a lot of things. With Infocom (but not Inform) code, you can also
    >assume that global variables have the same type throughout the code,
    >and that all properties of a given number have the same type.

    But presumably, because of Inform/Z-machine's weak typing, there are many
    situations where this could go wrong? Things like PrintOrRun() or others
    where different types are passed in at different points in a game?

    Regards,
    Graham Holden (g-holden AT dircon DOT co DOT uk)
    --
    There are 10 types of people in the world;
    those that understand binary and those that don't.
  27. Archived from groups: rec.games.int-fiction (More info?)

    In article <2n5f01p8hlb2ml5i9qts95bicgcpl6ihs3@4ax.com>,
    Graham Holden <look@bottom.of.post> wrote:
    >
    >But presumably, because of Inform/Z-machine's weak typing, there are many
    >situations where this could go wrong? Things like PrintOrRun() or others
    >where different types are passed in at different points in a game?

    PrintOrRun is an Inform thing; I don't think Infocom used anything
    like it (I could be mistaken, it's been a while since I disassembled
    the story files). However, even if you allow that case you've got it
    down to a packed address; knowing it's either a routine or a string is
    probably enough for most purposes, particularly since any constant
    with such a type can be disambiguated in Inform-generated code.

    Undoubtedly, though, you wouldn't be able to determine every type, and
    you'd have to call for user help.
  28. Archived from groups: rec.games.int-fiction (More info?)

    Matthew Russotto wrote:
    > In article <2n5f01p8hlb2ml5i9qts95bicgcpl6ihs3@4ax.com>,
    > Graham Holden <look@bottom.of.post> wrote:
    > >
    > >But presumably, because of Inform/Z-machine's weak typing, there are
    many
    > >situations where this could go wrong? Things like PrintOrRun() or
    others
    > >where different types are passed in at different points in a game?
    >
    > PrintOrRun is an Inform thing; I don't think Infocom used anything
    > like it (I could be mistaken, it's been a while since I disassembled
    > the story files). However, even if you allow that case you've got it
    > down to a packed address; knowing it's either a routine or a string
    is
    > probably enough for most purposes, particularly since any constant
    > with such a type can be disambiguated in Inform-generated code.

    There are also properties which can hold either an object number,
    or a string or a routine, like n_to. There can't be any clues
    in the code as to what kind of values they may contain.

    /Fredrik
  29. Archived from groups: rec.games.int-fiction (More info?)

    Here, Joel Ray Holveck <joelh@piquan.org> wrote:
    > (In fact, the standard Inform
    > library relies on the fact that n_to and n_obj have the same number--

    No, I don't believe it does. It's unfortunately easy for the *user* to
    write code that relies on this assumption (writing "rm.n_obj" when you
    meant "rm.n_to"), but the library doesn't contain that assumption out
    of the box.

    --Z

    "And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
    *
    I'm still thinking about what to put in this space.
  30. Archived from groups: rec.games.int-fiction (More info?)

    samwyse <dejanews@email.com> wrote:

    > The major versions of Inform are quite different from each
    > other, there is a C compiler (AFAIK only used for "Silicon Castles"),
    > and then there's ZIL, of which little is known but was used to create
    > "the canon". Turing a storyfile compiled using Inform 4 into Inform 5
    ^^^^^^
    _Nice_ typo.

    Richard
  31. Archived from groups: rec.games.int-fiction (More info?)

    samwyse wrote:
    >
    > On or about 2/6/2005 4:37 PM, David Alex Lamb did proclaim:
    > > My knowledge of decompilers has deteriorated a lot in the last 25 years, but
    > > I'm fairly sure this is a "standard problem" solved by some variant of type
    > > inference or some other form of dynamic code analysis. I think there are
    > > similar issues with byte-code-to-native compilers. I'm not saying it's easy,
    > > just that there is literature in the compiler community that addresses similar problems.
    >
    > Quite so. Java is very strongly typed, and (to support the
    > retrospection APIs) the class files keep a ton of data that would
    > normally be of interest to only to a debugger. As a result, Java
    > decompilers are a dime a dozen, because you can get by with a purely
    > static analysis. The Z machine is the exact opposite of Java, looking
    > more like an Motorola 6502. The architecture even seems to support
    > self-modifying code, although no one seems to have ever tried writing any.
    >
    Since the 6502 was *not* made by Motorola, that was some trick alright...

    --
    +--------------------------------------------------------------------+
    | Charles and Francis Richmond The way feminists see oppression |
    | everywhere, men see urinals. |
    | richmond at plano dot net It's a design feature. -- Fred Reed |

    +--------------------------------------------------------------------+
  32. Archived from groups: rec.games.int-fiction (More info?)

    On 2005-02-07 11:13:52 -0800, Andrew Plotkin <erkyrath@eblong.com> said:

    >> (In fact, the standard Inform library relies on the fact that n_to and
    >> n_obj have the same number--
    > No, I don't believe it does. It's unfortunately easy for the *user* to
    > write code that relies on this assumption (writing "rm.n_obj" when you
    > meant "rm.n_to"), but the library doesn't contain that assumption out
    > of the box.

    You are, of course, correct. I had thought (for some silly reason)
    that the DM4 specified door_to should hold n_obj, not n_to, and had
    written my code that way. Later I realized that GoSub would require
    them to have the same number for ENTER DOOR to work.

    Upon checking my assumptions, I discover that I was quite mistaken, and
    apologize to the standard library authors.

    Piquan
  33. Archived from groups: rec.games.int-fiction (More info?)

    >> (In fact, the standard Inform library relies on the fact that n_to and
    >> n_obj have the same number-- something I found out, much to my dismay,
    >> when I violated this just before putting my program in beta.)
    > I'm surprised to hear that. Where does that happen?

    Only in my imagination, as it turns out; see my reply to Zarf.

    Piquan
  34. Archived from groups: rec.games.int-fiction (More info?)

    David Alex Lamb wrote:
    > My knowledge of decompilers has deteriorated a lot in the last 25 years,
    > but I'm fairly sure this is a "standard problem" solved by some variant
    > of type inference or some other form of dynamic code analysis. I think
    > there are similar issues with byte-code-to-native compilers. I'm not
    > saying it's easy, just that there is literature in the compiler
    > community that addresses similar problems.

    Since no one has mentioned it yet, let me plug my Z-machine decompiler, Reform:

    http://www.darkweb.com/~benrg/if-decompilers/

    It relies on the user to specify the types of local variables, arrays, and
    whatnot, and does some simple local type inference to figure out the types
    of local constants. E.g. if the expression "local2-->0 == $1234" appears in
    the code, and the user declared local2 to be a pointer to a word array of
    string addresses, then $1234 will be tagged as a string address. It works
    well in practice, but at the expense of a lot of manual effort to identify
    the types of variables.

    I was somewhat aware of research in decompilation when I started. I read
    Cifuentes's thesis, and even implemented some of the algorithms, but none of
    that made it into the released version; I settled for simpler ad-hoc
    techniques that worked well enough. I'd love to return to it, but I don't
    know if I'll ever have the time.

    One problem I had is that high-level analysis conflicts with accurate
    decompilation to some extent. Zilch and Inform are very simple compilers,
    and simple decompilation preserves more of the original program structure
    than fancy graph analysis would. If you just pattern-match on code fragments
    produced by the Inform compiler, you'll do fine. If you try to get clever,
    you can easily end up with a flowgraph that can't be expressed directly in
    Inform.

    >>You can use some reasonable heuristics. For example, if local variable
    >>1 is immediately printed, it's a very good bet that $4A8E is a string
    >>address.

    The problem is that it's a very good bet, rather than a certainty. I
    actually implemented global type inference for Reform, but never got it to
    work properly. It always inferred that (nearly) everything had every type. I
    never tracked down exactly why. Undoubtedly one reason is that I didn't do
    flow analysis on local variables; if one got reused to hold a different
    type, equivalence classes would be merged incorrectly. Global variables are
    nastier. Inform likes to use g254 and g255 to hold temporaries. Memory
    locations and properties are nastier still.

    I definitely believe that it could be made to work, at least much better
    than my primitive attempt. If anyone wants to pick up the torch, I'll help
    you out as best I can. :-)

    > The interesting trick is if some clever compiler treats them as property
    > identifiers in one place and simple integers in another. I have vague
    > memories of some peephole optimizers generating such multi-use situations.

    Both Inform and Zilch do basically no optimization, so I don't think this
    situation arises.

    -- Ben
  35. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/7/2005 11:43 PM, Charles Richmond did proclaim:
    > Since the 6502 was *not* made by Motorola, that was some trick alright...

    You know, I was going to say "the Moto 6800", then decided that it was
    more like the 6502. I think I've programmed in too many different
    assembly languages, they're all starting to blur together.
  36. Archived from groups: rec.games.int-fiction (More info?)

    I will definitely check out Toyshop. What would you and other users say
    are the top 3 games I can get the source code to, to study and learn
    good design? Or will the Inform manual be sufficient?
  37. Archived from groups: rec.games.int-fiction (More info?)

    In article <1107824365.807935.93710@g14g2000cwa.googlegrovps.com>,
    Fredrik Ramsberg <f.r@mail.com> wrote:
    >
    >Matthew Rvssotto wrote:
    >> In article <2n5f01p8hlb2ml5i9qts95bicgcpl6ihs3@4ax.com>,
    >> Graham Holden <look@bottom.of.post> wrote:
    >> >
    >> >Bvt presvmably, becavse of Inform/Z-machine's weak typing, there are
    >many
    >> >sitvations where this covld go wrong? Things like PrintOrRvn() or
    >others
    >> >where different types are passed in at different points in a game?
    >>
    >> PrintOrRvn is an Inform thing; I don't think Infocom vsed anything
    >> like it (I covld be mistaken, it's been a while since I disassembled
    >> the story files). However, even if yov allow that case yov've got it
    >> down to a packed address; knowing it's either a rovtine or a string
    >is
    >> probably enovgh for most pvrposes, particvlarly since any constant
    >> with svch a type can be disambigvated in Inform-generated code.
    >
    >There are also properties which can hold either an object nvmber,
    >or a string or a rovtine, like n_to. There can't be any clves
    >in the code as to what kind of valves they may contain.

    That wovld be a problem, bvt only for Inform code. It's Infocom code
    I'm actvally more interested in. The problem is similar to the problem of
    vnions in other langvages, so there's probably been some work on it.
  38. Archived from groups: rec.games.int-fiction (More info?)

    <snip bits about using various heuristics to determine types/uses in
    decompiled Z-code.>

    One approach that might help would be to build an interpreter into the
    decompiler (or build a decompiler into an interpreter...) and allow the
    game to be played through. If a suitable "walkthrough" is available, this
    wouldn't need too much user-input, but even if it had to be played
    manually, it probably wouldn't feel like the same level of user input as
    trying to guess the meaning of lumps of code (and certainly wouldn't need
    the same "hacker-level" user to do it).

    The idea being that by noting how addresses, references, variables etc. are
    _actually_ used (by seeing what the interpreter does in a real game
    session), instead of how they _might_ be used (through code analysis), you
    might get a better picture of what was originally intended.

    Quite how you'd go about this I don't know yet... it's just an "off the top
    of my head" idea at the moment.

    Regards,
    Graham Holden (g-holden AT dircon DOT co DOT uk)
    --
    There are 10 types of people in the world;
    those that understand binary and those that don't.
  39. Archived from groups: rec.games.int-fiction (More info?)

    In article <7p6h01lakst0opo1mmlss053ru7ut0a4gr@4ax.com>,
    Graham Holden <look@bottom.of.post> wrote:
    >
    >One approach that might help would be to build an interpreter into the
    >decompiler (or build a decompiler into an interpreter...) and allow the
    >game to be played through. If a suitable "walkthrough" is available, this
    >wouldn't need too much user-input, but even if it had to be played
    >manually, it probably wouldn't feel like the same level of user input as
    >trying to guess the meaning of lumps of code (and certainly wouldn't need
    >the same "hacker-level" user to do it).

    Now that's an interesting way of getting around a number of problems
    -- particularly the function-pointer problem and the untyped-constant
    problem. You could use this to collect data which would be fed into a
    more traditional decompiler.
  40. Archived from groups: rec.games.int-fiction (More info?)

    On or about 2/7/2005 1:08 PM, Mad Scientist Jr did proclaim:
    > Alas if I only had a slightly bigger brain. I would think the value in
    > dcompiling z-code would mainly be to see how the masters wrote the
    > classic Infocom games - more for learning and game design purposes. I'm
    > sure a lot of that can be done from newer Inform games whose source is
    > freely available, but I still would love to start by learning from the
    > original Zork.

    The only problem is that the old games were frequently buggy. Take a
    look at http://www.google.com/search?q=infocom-bugs, but be warned that
    there are a lot of spoilers on those lists. Inform has a lot of
    safety-checks built in to prevent a large number of those problems. The
    old games are respected today more for their writing and their ideas,
    not their implementation.

    There are some neat objects in some of the games, but I think that
    everyone one of them has been re-created in Inform. Check out "Toyshop"
    for details. Also, there is a re-implementation of "Deadline" (or is it
    "Suspect") that was written entirely in Inform. However, I don't know
    if the author worked from a disassembly of the original, or created
    everything from scratch.
  41. Archived from groups: rec.games.int-fiction (More info?)

    On Tue, 08 Feb 2005 16:18:41 GMT, Mad Scientist Jr scrawled:

    > I will definitely check out Toyshop. What would you and other users say
    > are the top 3 games I can get the source code to, to study and learn
    > good design? Or will the Inform manual be sufficient?
    >

    I learned from the DM4, and also picked up a few tips as to what to use
    and when, after reading the Inform Beginners Guide (which has been
    revamped in the last few months).

    --
    http://www.rexx.co.uk

    To email me, visit the site.
  42. Archived from groups: rec.games.int-fiction (More info?)

    Andrew Plotkin <erkyrath@eblong.com> writes:

    > If you're talking about ZIL, then I imagine they designed it ugly
    > because they were MIT hackers. And even though they were effectively
    > inventing a form of compiled BASIC, they were at least going to
    > *pretend* it was LISP-like.

    Or maybe they just didn't want to rewrite the existing Zork source
    which was in MDL, and therefore retained the syntax even though the
    actual language turned out to be very different.

    --
    Esa Peuha
    student of mathematics at the University of Helsinki
    http://www.helsinki.fi/~peuha/
  43. Archived from groups: rec.games.int-fiction (More info?)

    "Mad Scientist Jr" <usenet_daughter@yahoo.com> writes:

    > Alas if I only had a slightly bigger brain. I would think the value in
    > dcompiling z-code would mainly be to see how the masters wrote the
    > classic Infocom games - more for learning and game design purposes. I'm
    > sure a lot of that can be done from newer Inform games whose source is
    > freely available, but I still would love to start by learning from the
    > original Zork.

    You might want to check out Dungeon; after all, it _is_ the "original
    Zork".

    --
    Esa Peuha
    student of mathematics at the University of Helsinki
    http://www.helsinki.fi/~peuha/
  44. Archived from groups: rec.games.int-fiction (More info?)

    Here, Esa A E Peuha <esa.peuha@helsinki.fi> wrote:
    > Andrew Plotkin <erkyrath@eblong.com> writes:
    >
    > > If you're talking about ZIL, then I imagine they designed it ugly
    > > because they were MIT hackers. And even though they were effectively
    > > inventing a form of compiled BASIC, they were at least going to
    > > *pretend* it was LISP-like.
    >
    > Or maybe they just didn't want to rewrite the existing Zork source
    > which was in MDL, and therefore retained the syntax even though the
    > actual language turned out to be very different.

    I doubt very much that functional MDL code works as functional ZIL
    code. Why would the original MDL have avoided list operations?

    --Z

    "And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
    *
    I'm still thinking about what to put in this space.
  45. Archived from groups: rec.games.int-fiction (More info?)

    The idea of writing a new compiler targeting a new virtual machine, and
    tweaking every little detail of both in such a way that you can use the
    unmodified source code for one program to compile and run a single
    game... I'd call that preposterous.

    But then again, I'm the sort of person who'd probably do that for the
    fun of it, and I've known quite a few programmers who'd feel the same.
Ask a new question

Read More

Video Games