There are usually two ways of getting the stack blown up. The first is by doing infinite recursions. The second is by calling functions that are using too much stack. Now these crashes ain't very easy to locate and analyse, because of their natures. If you have a crash handler, it'll run in an environment which is already quite destroyed. I had the issue at least once that the stack overflow would actually erase the TLS of the thread, leading to a very interesting environment to debug. If you run something like SELinux in paranoid mode, your crash handler won't even have a chance to run, as this is considered as an "attack" somehow.

Anyway. For the infinite recursion problem, there's an heuristic that we put in place that seems to work pretty well. Of course, it'd depend on the structure of your software, but that should work in the most usual case. The idea is to store somewhere on TLS (or equivalent) the address of the stack pointer before doing any kind of recursion, in your main loop for example. Then in a strategic function that should be called virtually by anything, add a check by computing the difference of this old stack pointer with the current stack pointer. If the value is greater than a certain amount, then breakpoint, send an e-mail, assert, whatever. Don't forget that on most systems, the stack grows down. Meaning the value of the stack pointer will decrease with time. How do you get the stack pointer ? Easy: get the address of a local variable.

Now, for the functions that are taking too much stack. There's quite only one assembly instruction in x86 that will "allocate" space on the stack. That would be something like this:

$ objdump -d --prefix-addresses SomeBinary | head

SomeBinary:     file format elf32-i386

Disassembly of section .init:
082d5548 <_init> push   %ebp
082d5549 <_init+0x1> mov    %esp,%ebp
082d554b <_init+0x3> sub    $0x8,%esp
082d554e <_init+0x6> call   082d6aa4 <call_gmon_start>
082d5553 <_init+0xb> call   082d6b30 <frame_dummy>
082d5558 <_init+0x10> call   08ebba20 <__do_global_ctors_aux>

The sub $0x8,%esp is "allocating" 8 bytes on the stack by substracting 8 to the stack pointer. So a way to heuristically check all the functions that are eating the most stack when starting is the following unix command:

objdump -d --prefix-addresses SomeBinary | grep sub.*\$0x[0-9a-f]*,%esp | awk ' { split($4, x, ","); print strtonum(substr(x[1], 2)),$0; } ' | sort -un | c++filt

This is going to search through the binary for all the sub $0xVV,%esp lines, and sort by its value, printing the decimal value of the "stack size" in the first column. You'll see the function name right after. Interestingly enough, you can see for example that the zlib is quite hungry:

$ objdump -d --prefix-addresses SomeBinary | grep sub.*\$0x[0-9a-f]*,%esp | awk ' { split($4, x, ","); print strtonum(substr(x[1], 2)),$0; } ' | sort -un | c++filt | grep zlib
47084 08e5f182 <zlib_uncompress+0x6> sub    $0xb7ec,%esp
47100 08e5f0ac <zlib_uncompress2+0x6> sub    $0xb7fc,%esp
300108 08e5f250 <zlib_compress+0x6> sub    $0x4944c,%esp