I am writing a websocket server in PHP (using the
sockets extension) and I need a bit of help understanding to what extent I need to deal with fragmented messages.
My understanding of how websocket information is passed is as follows:
- Client application sends a
MESSAGE(of arbitrary length) to the client-side API.
- Client-side API splits the
MESSAGEinto one or more
FRAMES(also of arbitrary length) and sends them to the network layer.
- The network layer splits the data into a number of
PACKETSto be sent over the network via TCP.
- The server receives the TCP
PACKETS(possibly out-of-order, but it re-orders them if necessary) and delivers them to the application that is listening on the relevant port.
- The application calls
socket_recv()to read the received data from the socket.
The thing I want to understand is what data that my application will see when reading a stream of websocket data using
Specifically, to what extent do I need to worry about the fragmentation?
To help explain my question, here is the above process in diagrammatic form:
1. Web app (messages): [Message_1][Message_2] 2. Browser (frames) : [Messag][e_1][Messag][e_2] 3. TCP send (packets) : [Mess][ag][e_1][Mess][ag][e_2] 4. TCP recv (packets) : [ag][Mess][e_2][ag][Mess][e-1] 5. socket_recv : ???
If I call
socket_recv() in a loop, until it returns a length of zero (adding to my internal buffer each time), am I guaranteed to get a single, complete
socketrecv: [Message_1] socketrecv: [Message_2]
Or a single complete
socketrecv: [Messag] socketrecv: [e_1] socketrecv: [Messag] socketrecv: [e_2]
Or, will it actually be an arbitrary series of
PACKETS representing whatever data has been received so far (which may therefore be a partial
FRAME or even multiple
socketrecv: [Messag socketrecv: e_1][Mess socketrecv: socketrecv: ag socketrecv: e_2]
Or something else?
I am quite happy stitching together the various
FRAMES of data, but it will make things a lot easier if I can assume the first bytes of received data in each poll (instigated using
socket_select()) will always be the
FRAME header, rather than having to handle it as a raw byte stream which needs to be stitched back into
FRAMES before we begin.