Issue information

Issue ID
#8594
Status
Confirmed
Severity
High
Started
Makoto
Mar 29, 2015 7:20
Last Post
Alexandria
Aug 30, 2015 4:08
Confirmation
Yes (3)
No (0)

Makoto - Mar 29, 2015 7:20

[code=auto:0] #0 pop_stack (st=0x7ffff37c5664, start=0, end=0) at script.c:3326 stack = 0x0 data = <optimized out> i = <optimized out> __FUNCTION__ = "pop_stack" #1 0x000000000059088a in run_func (st=0x7ffff37c5664) at script.c:4001 data = <optimized out> i = <optimized out> start_sp = <optimized out> end_sp = <optimized out> func = 238 #2 0x000000000059b304 in run_script_main (st=0x7ffff37c5664) at script.c:4236 c = <optimized out> cmdcount = 655351 gotocount = 2048 sd = <optimized out> stack = 0x7ffff008a3a4 nd = <optimized out> #3 0x0000000000568b57 in run_script_timer (tid=<optimized out>, tick=<optimized out>, id=153745, data=<optimized out>) at script.c:4086 sd = <optimized out> st = 0x7ffff37c5664 #4 0x0000000000641932 in do_timer (tick=2028067395) at timer.c:398 ---Type <return> to continue, or q <return> to quit--- [/code]
[b]script.c:3326[/b] (this line)
[code=auto:0] if( end > stack->sp ) [/code]
Happen like 7 times now.

Dastgir - Mar 29, 2015 7:26

Somehow it seems , stack is null, yet the script continues to execute all things, maybe a stack check is needed.
What say @malufett , @Haru

Makoto - Mar 29, 2015 8:37

Dastgir while adding this[code=auto:0] if (stack == NULL) return; [/code]
I get new crash dump.
[url="http://pastebin.com/raw.php?i=WpPAKnaq"]http://pastebin.com/raw.php?i=WpPAKnaq[/url]

script.c:4183
[code=auto:0] enum c_op c = script->get_com(st->script->script_buf,&st->pos); [/code]

script.c:4087
[code=auto:0] script->run_main(st); [/code]

timer.c:398
[code=auto:0] timer_data[tid].func(tid, timer_data[tid].tick, timer_data[tid].id, timer_data[tid].data); [/code]

core.c:445
[code=auto:0] int next = timer->perform(timer->gettick_nocache()); [/code]

This post has been edited by Makoto on Mar 29, 2015 8:52

4144 - Mar 29, 2015 8:59

If using gcc 4.8 or gcc 4.9 for crash hunting better build like this:
[code=auto:0] make clean ./configure --disable-lto --enable-manager=no --enable-sanitize make [/code]
After if server crash without gcc it may show bit more information


If compiler older, for crash hunting better use this:[code=auto:0] make clean ./configure --enable-manager=no make [/code]

Makoto - Mar 29, 2015 9:06

I'm using[code=auto:0] gcc --version gcc (Debian 4.7.2-5) 4.7.2 [/code]
And also already using[code=auto:0] ./configure --enable-debug=gdb --disable-lto [/code]
Yours makes my server wont start.

This post has been edited by Makoto on Mar 29, 2015 9:21

4144 - Mar 29, 2015 10:09

You mean server crashed at start? then this is your issue.
Flag --enable-manager=no disable memory manager, it will not search for memory leaks but server can really crash if any memory corruption happend. without this flag memory corruption can be ignored. and server will crash in other correct places.

Makoto - Mar 29, 2015 14:49

[quote name="4144" timestamp="1427623780"]
You mean server crashed at start? then this is your issue.
Flag --enable-manager=no disable memory manager, it will not search for memory leaks but server can really crash if any memory corruption happend. without this flag memory corruption can be ignored. and server will crash in other correct places.[/quote]

The server wont start using ur option. I posted the crashlogs and so i have to wait Haru or Ind to take a look on it.

Haru - Mar 29, 2015 21:37

The crash occurs if atcommand @kick is executed in a script running for an autotrade character (i.e. in OnPCLoginEvent or OnPCLoadMapEvent).

Minimal testcase:[code=auto:0] - script Foo -1,{ OnPCLoginEvent: debugmes "I'm about to crash"; atcommand "@kick " + strcharinfo(0); debugmes "I already crashed. You won't see this line."; end; } [/code](and start a server with saved autotraders, or load the script, than @autotrade)

atcommand_kick will be executed by run_func through buildin_atcommand.
atcommand_kick triggers clif_GM_kick that, in case there's no fd (i.e. the sd is detached via @autotrade), calls map_quit.
map_quit dequeues the currently running script (npc_event_dequeue), freeing the script state (script_free_state).
Once control returns to run_func, st is completely wiped, and its contents have been zeroed, causing the crash.

The fix doesn't seem trivial. Some brainstorming is needed, to avoid side-effects (and to avoid keeping similar issues not fixed). Is it the script engine's duty to ensure that map_quit (or similar functions leading to the script being dequeued) to never be called (doesn't seem feasible)?
Or should npc_event_dequeue set a flag so that if it's called from within a function in the script itself, the function, and the entire script, gets aborted right away?

Ind, what do you think? Any good ideas?

4144 - Mar 29, 2015 23:36

If something call map_quit from script indirectly, may be need defer map_quit for some time? May be by timer?

Ind - Apr 2, 2015 11:16

I foresee the timer breaking other things. I'd say this is intended +___+ I guess scenarios that do this are very few? how about we predict them and detach the script from the user when they are triggered? or in the script free whatever we prevent it from deleting itself (i.e. we flag the script is running on run_main, and if it reaches deletion before run_main loop unflags it we assume it is deleting itself from within and refuse it?)

This post has been edited by Ind on Apr 2, 2015 11:20

HermeMaton - Apr 12, 2015 23:25

This NPC can crash the server:
[code=auto:0] prontera,255,55,0 script warpeador WARPNPC,10,10,{ OnInit: disablenpc "warpeador"; end; OnTouch: percentheal 100,100; warp "geffen",138,50; end; } prontera,273,54,5 script activawarp 4W_M_03,{ mes "[NPC]"; mes "Hi"; .@opcion = select("Enable Warper:Disable Warper:Exit"); next; switch (.@opcion) { case 1: goto lActivarW; case 2: goto lDesactivarW; } mes "[NPC]"; mes "Good luck."; close; lActivarW: enablenpc "warpeador"; mes "OK"; close; end; lDesactivarW: disablenpc "warpeador"; mes "OK"; close; end; end; } [/code]
Just put a merchant in autotrade in the range of the NPC "warpeador" OnTouch area and enable the NPC. This will immediately trigger the OnTouch actions over the merchant in autotrade and crash the server.

I made a temporaly solution for this adding a checks in npc.c to check the autorade status of the char afer some actions.

I have crash logs about this also:
[code=auto:0] #0 0x081bf673 in run_script_main (st=0xb54872a8) at script.c:4187 c = <value optimized out> cmdcount = 655348 gotocount = 2048 sd = <value optimized out> stack = 0xb44d62f8 nd = <value optimized out> #1 0x08151f09 in npc_event_sub (sd=0xcf29e00, ev=0xb5706474, eventname=0xbfeacbdd "bat_b01_rp1_a_warp::OnTouch") at npc.c:806 No locals. #2 0x0815c55a in npc_ontouch2_event (sd=0xcf29e00, nd=0xb77405d4) at npc.c:153 name = "bat_b01_rp1_a_warp::OnTouch\000\000\000\000\244\201\032\000\002\000\000\000&\000\000\000\030\001\000\000\200@5\b" #3 0x0814f42b in npc_enable_sub (bl=0xcf29e00, ap=0xbfeacce0 "\224\004t\267S") at npc.c:183 sd = 0xcf29e00 nd = 0xb77405d4 #4 0x08132536 in bl_vforeach (func=0x814f3d0 <npc_enable_sub>, blockcount=0, max=2147483647, args=0xbfeaccdc "\324\005t\267\224\004t\267S") at map.c:505 argscopy = 0xbfeaccdc "\324\005t\267\224\004t\267S" i = 7 returnCount = <value optimized out> #5 0x0813c8d6 in map_foreachinarea (func=0x814f3d0 <npc_enable_sub>, m=659, [/code]
And this too:
[code=auto:0] #0 push_val (stack=0x0, type=C_INT, val=1, ref=0x0) at script.c:3262 No locals. #1 0x081bc1a3 in buildin_donpcevent (st=0xb5538e68) at script.c:10144 event = 0xb5494a25 "KvM01_BG::OnGuillaumeWin" #2 0x081b3830 in run_func (st=0xb5538e68) at script.c:3993 data = 0xb78042a4 i = <value optimized out> start_sp = <value optimized out> end_sp = <value optimized out> func = 127 #3 0x081bf8fe in run_script_main (st=0xb5538e68) at script.c:4241 c = <value optimized out> cmdcount = 655325 gotocount = 2048 sd = <value optimized out> stack = 0xb4588cac nd = <value optimized out> #4 0x08151f09 in npc_event_sub (sd=0x169fe588, ev=0xb54b374c, eventname=0xb3c7314d "KvM01_BG::OnCroixDie") at npc.c:806 No locals. #5 0x0818014a in pc_dead (sd=0x169fe588, src=0x14ce6d98) at pc.c:7144 [/code]

Regards.

Alexandria - May 9, 2015 4:57

[font='comic sans ms', cursive][color=rgb(238,130,238)]did you already fix this issue?[/color][/font]

Dastgir - May 9, 2015 6:50

Maybe not yet

Makoto - May 11, 2015 15:27

Its still erroring, BUMP!

Dastgir - May 11, 2015 15:40

[quote name="Makoto" timestamp="1431358029"]
Its still erroring, BUMP!
[/quote]This isnt fixed yet :(
Hope @Haru or @Ind take a look at this.

Makoto - May 11, 2015 15:43

[quote name="Dastgir" timestamp="1431358856"][quote name="Makoto" timestamp="1431358029"]

Its still erroring, BUMP![/quote]This isnt fixed yet :(
Hope @Haru or @Ind take a look at this.[/quote]

My server are now crashing to the the sign quests since it has the script like what ^ @Hermematon posted

Haru - May 11, 2015 16:16

The issue posted by HermeMaton doesn't seem related, or is it? This issue is about a NULL stack pointer when a character is kicked out during its OnPCLoginEvent.

Makoto - May 11, 2015 16:56

[quote]

Just put a merchant in autotrade in the range of the NPC "warpeador"
OnTouch area and enable the NPC. This will immediately trigger the
OnTouch actions over the merchant in autotrade and crash the server.[/quote]

its still autotrade merchant isn't it? But its different its not kick, its warp.

Alexandria - May 11, 2015 17:59

[color=#ee82ee][font='comic sans ms', cursive]Please, fix it because our server is crashing too much :S[/font][/color]

4144 - May 11, 2015 21:27

@Alexandria this crash happend only if wrong script. if script kick player on login event handler. Other crashes can be other issues

Makoto - May 12, 2015 18:00

[quote name="4144" timestamp="1431379629"]
@Alexandria this crash happend only if wrong script. if script kick player on login event handler. Other crashes can be other issues[/quote]

There's some official script that has ontouch and when you do the quest and vend near = mapcrash (geffen fountain)

Makoto - May 13, 2015 4:43

Really want this to get fixed.

HermeMaton - Jun 2, 2015 16:06

Well i use this temporaly fix:

in npc.c function "npc_event_sub", change the last line:

[code=auto:0] script->run(ev->nd->u.scr.script,ev->pos,sd->bl.id,ev->nd->bl.id); [/code]
for this:

[code=auto:0] if( !sd->state.autotrade ) script->run(ev->nd->u.scr.script,ev->pos,sd->bl.id,ev->nd->bl.id); [/code]

Hope this helps until the official fix.

Regards.

Alexandria - Aug 30, 2015 4:08

[color=#ee82ee][font='comic sans ms', cursive]staff?[/font][/color]