Issue information

Issue ID
#287
Status
Started
Severity
Medium
Started
Hercules Elf Bot
Oct 24, 2007 1:47
Last Post
Hercules Elf Bot
Mar 13, 2012 6:23
Confirmation
Yes (4)
No (0)

Hercules Elf Bot - Oct 24, 2007 1:47

Originally posted by [b]theultramage[/b]
http://www.eathena.ws/board/index.php?autocom=bugtracker&showbug=287

I added a safeguard against send buffer overload (too many packets queued at once causing increased memory usage, etc), and this uncovered a very bad bug...

According to network traffic traces, this is what happens during startup...

1. The agit init script calls 'getcastledata(..., 0, ...).
getcastledata normally just returns a piece of information that's already loaded, but for some mysterious reason (probably stupidity), case 0 performs a charserver data load request instead - ONE F*CKING ATTRIBUTE AT A TIME [code]for(i=1;i<26;i++) // Initialize[AgitInit]
guild_castledataload(gc->castle_id,i);[/code]As to why castle data loading isn't done automatically at initial charserver connect but instead the script engine has to request it, I'll leave that for future examination. The point is this packet spam.

2. After loading is done, the event provided with 'getcastledata' is executed for each callee (=> castle), which is...
[code]OnRecvCastleN01:
RequestGuildInfo GetCastleData("nguild_alde",1);[/code]which ends up with a 6-byte packet 0x3031 request (guild data lookup).

3. Charserver processes packet, and builds packet 0x3831, which contains a complete dump (!) of a guild object = 13kB, and shoves it into the send queue.
Now, this part is really dumb, because when a guild owns more than 1 castle, the data gets sent multiple times...

The result is that nCastles x 13K bytes gets shoved into the char->map pipeline at once (w/ redundancy). This, coupled with extra unsent stuff and increased castle count to 50 (or so I assume) causes a buildup of more than 1 MB of data, which the socket code will flag as overload and the connection will get killed.


In function WFIFOSET, there was once
[code]if (s->wdata_size > (TCP_FRAME_LEN))
send_from_fifo(fd);[/code]that actually forced the server to not buffer anything past 1 kilobyte. I nuked that part some time ago, and this is the result... I guess rethinking the strategy is in order...

This post has been edited by Brian on Mar 5, 2012 5:05

Hercules Elf Bot - Mar 2, 2012 16:18

Originally posted by [b]Gepard[/b]
This is actually far worse than described.

There is a huge redundancy when receiving castle info and guild info:[list]
[*]When map-server logs into char-server, two things happen:
[list]
[*]char-server sends all castles to map-server in 0x3842 packet
[*]map-server executes all OnInterIfInit/OnInterIfInitOnce script events
[/list][*]0x3842 packet is parsed and all castle data is updated, and then map-servers requests guild info for each castle (for the last occupied, it does it twice, and also enqueues ::OnAgitInit and ::OnAgitInit2 events in guildinfoevent_db) with 0x3031 packets
[list]
[*]char-server responds with 0x3831 packets (len=11324); as soon as map-server receives info for the guild of the last occupied castle, it triggers the enqueued events ::OnAgitInit and ::OnAgitInit2.
[/list][*]Meanwhile OnInterIfInitOnce events in WoE scripts do similar stuff again:
[list]
[*]first, GetCastleData causes spam of 0x3040 packets (asking one attribute at time, totaling 17 packets for each of 34 castles = 578 packets with info that has already been sent in 0x3842)
[*]for each castle an event ::OnRecvCastle is enqueued in castleinfoevent_db); as soon as map-server receives info for the last requested (17th) castle attribute, it triggers the event
[*]the event causes RequestGuildInfo, which in turn causes 0x3031 packet to be sent again to char-server
[/list]
[/list]
It turns out that both script and source code do the similar job loading castle info and guild info from char-server.

I suppose it would be sufficient if source code did all the loading by itself, while script did only event enqueueing part, so futher parts of WoE script (like guardian spawning) would be executed when map-server is done with loading data from char-server.

Hercules Elf Bot - Mar 4, 2012 18:58

Originally posted by [b]Gepard[/b]
#1 and #2 from original report has been fixed in [rev=15657].

Related issues that has not been mentioned yet:[list]
[*]some pieces of guild castle data are updated via char-server (ie. map -> (request) -> char -> (ack) -> map), and some are updated on map-server immediately and only then sent to char-server
[list]
[*]it may cause inconsistent state when char-server is offline
[*]note: denying guild castle changes when char-server is offline looks like a bad idea, as it could interfere with WoE scripts, effectively breaking WoE gameplay in case of map-char disconnection
[*]because of that, guild castles need some mechanics of autosaving or saving on reconnect
[/list][*]since [rev=15657] map-servers are not aware of guild castles located on other map-servers, which may cause issues with:
[list]
[*]number of castles owned displayed in client's guild window
[*]max_guild_castles setting
[list]
[*]I suggest putting this piece of information into guild structure so it is available on all map-servers
[/list]
[/list]
[/list]

This post has been edited by Gepard on Mar 4, 2012 19:05

Hercules Elf Bot - Mar 5, 2012 0:12

Originally posted by [b]Gepard[/b]
Issues with incosistent saving and saving after reconnect to char-server fixed in [rev=15658].

Hercules Elf Bot - Mar 9, 2012 22:17

Originally posted by [b]Vali[/b]
Hi,

The problem now with [i]"map-server now requests data for all guild castles from char-server on initial connect ([url="../../index.php?app=tracker&showissue=287"]bugreport:287[/url])[/i]" is that the scripts that use GetCastleData(strnpcinfo(2),1); with OnInterIfInitOnce: or OnInterIfInit: will not get that information correctly because the map server do not has it.

The possible solution is request the information before the scripts are loaded.

Vali~

Hercules Elf Bot - Mar 10, 2012 10:34

Originally posted by [b]Gepard[/b]
Scripts that use castle data should start on [font=courier new,courier,monospace]OnAgitInit [/font]or [font=courier new,courier,monospace]OnAgitInit2 [/font]labels, not [font=courier new,courier,monospace]OnInterIfInit.[/font]

See [rev=15657/trunk/npc/guild/agit_template.txt] for reference.

This post has been edited by Brian on Mar 12, 2012 18:23

Hercules Elf Bot - Mar 10, 2012 13:09

Originally posted by [b]Vali[/b]
I was talking more about custom scripts. If there is not option to get it at the start then the only way is put a sleep and wait until the mapserver finished to load the castle data.

Vali~

Hercules Elf Bot - Mar 12, 2012 9:15

Originally posted by [b]Gepard[/b]
All castle data is loaded automatically at the initial connection between map-server and char-server (so there is no need to do manual requests on OnInterIfInitOnce).
Once castle data (and guild data for the castle owners) is loaded, map-server triggers OnAgitInit labels in all scripts.

There is no way (and never was) to have castle data available at map-server startup. You need to wait until it's loaded from char-server. Now it is done automatically, and all you have to do is to make sure that custom scripts that use castle data start with OnAgitInit label.

Hercules Elf Bot - Mar 12, 2012 15:57

Originally posted by [b]Vali[/b]
I understand. Thank you for the explanation Gepard.

Hercules Elf Bot - Mar 13, 2012 6:23

Originally posted by [b]karazu[/b]
so how to fix it vali? hehehe!