IoT network traffic dataset using the custom flow representation
Dataset DOI: 10.5061/dryad.6q573n6c1
This dataset provides Custom Flow representations derived from raw IoT network traffic traces, capturing detailed behavioral characteristics of IoT communications. Each Custom Flow encapsulates network behavior in a structured, vectorized format that includes flow-level metadata, packet sequence timing, direction, and selected payloads. Flows are uniquely identified by a five-tuple: device IP address, remote IP address, protocol, device port, and remote port, and maintain a fixed one-minute lifetime. The dataset was generated from 60 days of packet capture (PCAP) traces obtained from the publicly available UNSW IoT Traffic Analytics platform, covering 22 consumer IoT device types and containing over 5.9 million custom flow records.๐ Description of the data and file structure
We provide two compressed archives with different flow variants:
๐ File Structure
Each archive contains 60 daily Parquet files organized by date:
bidirectional/
โโโ 16-09-23.parquet
โโโ 16-09-24.parquet
โโโ 16-09-25.parquet
โ ...
โโโ 16-11-20.parquet
โโโ 16-11-21.parquet
โโโ 16-11-22.parquet
unidirectional/
โโโ 16-09-23.parquet
โโโ 16-09-24.parquet
โโโ 16-09-25.parquet
โ ...
โโโ 16-11-20.parquet
โโโ 16-11-21.parquet
โโโ 16-11-22.parquet
๐ Custom Flow Structure
Each custom flow record has the following structure:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Custom Flow โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Flow Meta โ P01_Meta โ P02_Meta โ ... โ Pi_Meta โ B bytes flow payload โ
โโโโโโโโฌโโโโโโโดโโโโโโโฌโโโโโโดโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ โ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PACKET METADATA โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ Time Offset โ Pkt Size โ Direction โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โผ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ FLOW PAYLOAD โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ P_start โ P_end โ P_start โ P2_B001 โ P2_B... โ P2_Bn โ P_end โ Pad โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FLOW METADATA โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Relative โ IPv4 โ Remote โ Device โ Protocol โ Total โ Total โ
โ Timestamp โ โ Port โ Port โ โ Bytes โ Packets โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Data Fields
Each custom flow includes the following fields:
Flow Metadata
- Device: Device MAC Address
- FirstSeen: Unix timestamp of the first packet (ยตs)
- RemIP: Remote IPv4 address
- Proto: Transport layer protocol
- DevPort: Device-side port number
- RemPort: Remote-side port number
- TotalFlowSize: Total byte count of the flow
- PacketCount: Total packet count of the flow
Packet-Level Features (first 10 packets)
- P00_TO - P10_TO: Time offset from flow's first-seen packet timestamp (ยตs)
- P00_PS - P10_PS: Packet size (bytes)
- P00_D - P10_D: Direction flag (1 = device โ remote, 0 = remote โ device)
- C_000 - C_2999: Transport-layer payload bytes with delimiters,
-4for packet start and-8for packet end