For the people who know me, this is just a repost of a very old article I wrote. I just think this is the perfect spot for it. I'm just changing the wording a bit, but this is essentially the same thing.
As I said during the opening post, Yazoo and I are quite into Reverse Engineering, which is usually called "Hacking". While this word is always causing some controversy, we claim to be on the right side of the line.
That being said, we still receive a lot of mail and queries so we can teach people how to hack, or shorter, so we can write some so-called tutorials. We always denied such requests. This article is attempting to explain why. The main, short-hand reason is "because any hacking tutorial can only be crap", but I hope that I can point to the full reasons why this methodology doesn't work at all. Don't read this as a real attempt to teach you how to hack, but as an attempt to explain why usual tutorials just can't work.
So, there, I won't teach you PSX or PS2 or any kind of console hacking. I'm going back in the depth of the very old dark ages of MS-DOS. But before we start, you can find all the files I'm refering here, in this page
Let's begin with the easiest software. Hello-01.com. If you run this file (yes, you can run it! It's an executable! Download it, double click on it, you'll see! Press any key to stop it.) If you run it, you'll see it displaying the wonderful string "Hello World!". Let's now say we want to translate this software, in french, for example. Let's open this file into our favorite hex editor. This is what we should see:
Huh, seems easy. Garbage at the beginning, text at the end. Let's ignore the garbage, and change the text, right here, right now! Okay, so, huh, wait, there's that tiny $ at the end of the string we want to change. And this dollar isn't displayed. And it's at the end of the file. So there should to be some trick in it. Okay, it may be the "end of string" marker or something. So, let's don't forget to put it back when we translate the file. We get this:
Let's save it, and run it. We should get something like this at the end:
Okay, cool! That was easy, wasn't it ? Mhh, let's advance to the next level...
Let's try with the second software, Hello-02.com. Let's look at the file in hexworkshop again.
Hum, this doesn't seem more complicated, but, it seems that now, things are the other way around than before. Very little garbage at the beginning (that EB 0F thing) followed by the text, ended with the same $ than before, then by some more garbage. Okay, let's take a bet, and let's just change the text the same way; we'll see what happen. So, we have two options to change the text. Either we squash the garbage with our text, which doesn't seem to be a good idea (I haven't even tried it, and I can tell you, this is really not a good idea - usually, the "garbage" isn't just here to fill empty space, but give it a go if you want to be 100% sure), or to enlarge the file by just moving the garbage a bit. We'll do that second option. This is maybe not the best thing to do, but do we have a choice ? And let's just don't forget that $! Here's the actual result of the modification:
What if we run it ?
Woot! It works! Hey, translation work is easy as hell!... well, maybe not, let's see...
Time to try the Hello-03. If we run it, it now displays "This is a very long hello world!". Okay, doesn't seem to difficult anyway. What if we read it in the hex editor ?
What the hell ? Garbage at the beginning, text at the end, but, what ? We now have these $ instead of spaces. Mmmhh, okay, sounds strange, but let's be smart: let's follow the same pattern and modify it. Here is what we get:
Time to run it...
Uh huu, things went wrong here. Okay, hm, let's think again, and see what we did wrong. What did we forget... ? The answer is simple: we forgot something lots of people have a problem with: pointers. Pointers ? What ? Where ? Yeah, okay, let's dig a lot, have some terrible inspiration, and draw some arrows on the first picture:
Ouch, lots of drawings here, sorry. So, what does it mean. See the numbers which are in the circles ? If you have a good intuition, you can guess these are pointers. They magically correspond for each of them to the offset in the file of the next chunk of the string to be displayed. Follow the arrows each time to get the destination of the pointer. Here is the "meaning" of a pointer. It's a kind of electronic arrow that points the software where it should read something.
Hooray, we got it! We only have to change these magic numbers so that they correspond to our new chunks. But, wait. There are still some things which are unclear. There is a 01 each time after each pointer. What do they mean ? Maybe some kind of tag... oh well, never mind it now. And right after the last pointer, we have a bit of garbage, then the text. Okay, hm, so we have another problem: our new string has 8 parts, where we only have room for 7 pointers here. Well... let's try to trick it, and build only one string for the last part, by having a real space instead of that annoying $. Here are our modifications:
Run it... ?
Yeah! Almost it! "Almost" because it seems our "è" didn't went too well... Humf, seems that the software doesn't use the same "è" as we do have on our keyboard. But we won't try to map it right now. This would take too much try'n'trial. It'll be okay that way right now.
The last level of our little game. Hello-04 is the last, but not the least challenge of our silly game. Hex editing:
Oookay, so, now, we have garbage before, garbage after, and the text in the middle. Mh, let's don't be silly this time, and look for the pointers, as we now know there should be in that kind of weird configuration. Oh, there they are, at the end. So the last "garbage" part is our pointer list. Let's be happy and redo exactly the same thing as before:
Text changed, last chunk with space built-in, and pointers updated. Let's run it.
Woops! This time, it went terribly wrong! (I had to do a CTRL-C by the way to stop it, if you didn't know it)
That hacked software displayed but garbage! Not even a single part of the right stuff!
Hmm... what was wrong this time... ? Well, let's don't take too much time thinking, and let's go right to the answer of that mistery. To get this answer, we'd need another tool. This tool is called a disassembler, which will get us the assembler source code of the software we're trying to change. Click on the thumbnail to get the full picture.
Ouch, what does all of that mean... ? This is actually the full signification of the software we're trying to hack. I won't try to explain everything. The comments are quite self explanatory. What is to understand is that there is indeed a table of pointers at the end of the file, and that there's a loop, reaching each pointer, displaying the string here (a string terminated by "$") and then displaying a string which only contains a space. And the first instruction, "mov si, offset table_offsets" is actually the one which is causing our trouble. Why ? Let's add some arrows.
The first instruction loads the pointer to the pointer table. There was an hidden pointer! And, yes, we COULD have found it by simply looking the file. But the disassembler window gives way more hints about it. Oh! and by the way, we can also see that the offsets start with 100! That's the reason of the 01 tailing each of our pointer! Our hypothesis was dead wrong. This number was part of the pointers. Aaah, and if we look right into the algorithm, it loops until it reads a null pointer (0 that is) So we can add as many pointers as we want! We'd had to fix the offset to the table_offset first though. So, let's do another modification to our hack:
We fix that damn pointer at the beginning, we put away that "ugly" space-instead-of-a-$ trick, and we add another pointer to the list. Let's run it:
Yatta! This time it works! As we could see, moving "garbage" around sometime works, sometime doesn't. Why ?
Because it depends on the way the software is coded! It's not random, nor some constant. You have to it for each software you're going to hack. Or not guess. This is not the right word. you have to it. Understand is the key-word. You CAN'T hack a software PROPERLY by using some magic formula you've read somewhere. There is NO magic involved into hacking. If you want to hack something, you have to become a mind-reader. You have to read into the developper's mind. Just don't follow someone's else tutorial: do it yourself. Tutorials ain't good reading. Why ? Because they 1) are poorly written, wrong or at best, just incomplete and 2) won't give you the magical solution for the software YOU are gonna hack. Unless the software was coded by the same people, with the same libraries, with the same mood, you'd have very little chances to get some similarities between two games. And the differences just don't lie in small numbers from here to here. The differences affect deeply the whole structure of the software. The differences lie in the way the files are stored in the disc, in the way the pointers are structured to point to the text, in the way the text itself is stored/encoded/crypted/compressed/etc.
The best I could do is to provide you documentation, about various stuff. The key to hacking (and to lots of other stuff, such as algorithm writing, chess playing or piano playing) is NOT to follow tutorials. It's to reading documentation. Documentation isn't a tutorial. A tutorial is here to show you how somebody else did it - and maybe not the best way at all. A documentation is here to give you the tools and ideas to do it YOURSELF. A tutorial is just a bad documentation, since you have to extrapole informations out of it. Do it yourself. Read it again: DO-IT-YOURSELF. Train your mind. Read as much as documents you can. ASK questions for fuck safe! If you're really stuck, just ask questions! And intelligent questions, if you don't mind. Questions in the "how do I..."-form will only result in a tutorial-like answer. Questions in the "how do XXX work" or "What is XXX"-form would result in a constructive construction between you and the guy you asked, in order to get some ideas about to answer the "How do I..."-question yourself. All of this "dumb tutorial" was here to show you that tutorials are just bad. If you follow the tutorial from the Level n-1 to solve Level n, you'll screw it right.
I tried to cover all the bad things you could find in tutorials. False assumptions (the 01-tag for example). Bad documentation (hey, codepage is quite something easy to understand! You don't have to dig in any way to know the correct code for the è character). And incomplete explanations about a given problem, which lead to bad tricks and habits, such as the space-instead-of-a-$ trick which could be solved in a way better way.
As a conclusion, and if you haven't got the idea yet: just don't expect me to write any tutorial, ever. This is not in my way of doing. I'd rather spend 100 more time trying to guide you through some solutions (and maybe NOT the solution I'd do myself!) than giving you out a single line of tutorial. Just ask me questions. Just mail me. Just phone me. Just contact me by any mean you'd like. And ask structured questions about what's bugging you out. I'll try to be as constructive as possible, (if I have some time in my very busy schedule...) not by giving you my direct way to resolve your problem but by giving you some of the key and tools you're missing to process into a viable solution.
Okay, here is the final hint about "why it's so important not to follow a tutorial and apply that to another game, straight". Remember our Hello-02 hack ? Seems it went allright, huh ? Okay, we just followed the "Hello-01" tutorial to do it. And things seemd to be all right, which seems to defeat my previous rule. But what did we do, in reality ? Let's read the final hacked file in a disassembler.
So, let's see... We changed the text allright, by moving the "garbage" a bit, which garbage seems to be instructions in reality. But, heck ? What the hell ? Our text is... chopped off... Huh ? The disassembler displays "...mond", but the software works nevertheless! What's the big idea ? Okay, time for some drawings...
See that little green circle ? It's a pointer in reality! It "points" to some chunk of code... and it points RIGHT in the middle of our text! See, these instructions here, they ARE the end of our changed text! HUGH! This is UTTERLY wrong! We actually added some instructions in the middle of the software! By "not understanding what the software was doing" and by "just following the previous tutorial", we actually added some funky instructions in the mess! Luckily they seems to be not too problematic (even though they tend to erase some part of the memory, which luckily is not used for us there), but the consequences could have been disastrous! Or even worse: random. Imagine: we could have introduced a random bug here, which would have been a terrible hell to track back. Or maybe impossible, if you really had no clue about what you were doing, and if you thought the tutorial you followed was all-right-and-magic.
So, lessons to learn here: Try to get clues about what the software does. Try to read the programmer's mind. Try to fully get the actual impact of the modifications you're doing to the software. And of course, don't read tutorials, ever - do it yourself instead.