IoT Custom Flow

IoT network traffic dataset using the custom flow representation

Dataset DOI: 10.5061/dryad.6q573n6c1

This dataset provides Custom Flow representations derived from raw IoT network traffic traces, capturing detailed behavioral characteristics of IoT communications. Each Custom Flow encapsulates network behavior in a structured, vectorized format that includes flow-level metadata, packet sequence timing, direction, and selected payloads. Flows are uniquely identified by a five-tuple: device IP address, remote IP address, protocol, device port, and remote port, and maintain a fixed one-minute lifetime. The dataset was generated from 60 days of packet capture (PCAP) traces obtained from the publicly available UNSW IoT Traffic Analytics platform, covering 22 consumer IoT device types and containing over 5.9 million custom flow records.

๐Ÿ“Š Description of the data and file structure

We provide two compressed archives with different flow variants:

๐Ÿ“ฆ bidirectional.tar.gz (1.78 GB)

Bidirectional Custom Flows capturing both upstream and downstream packets (~6 million flows)

๐Ÿ“ฆ unidirectional.tar.gz (795 MB)

Unidirectional Custom Flows capturing only upstream packets from device perspective (~3.5 million flows)

๐Ÿ“ File Structure

Each archive contains 60 daily Parquet files organized by date:

bidirectional/
โ”œโ”€โ”€ 16-09-23.parquet
โ”œโ”€โ”€ 16-09-24.parquet
โ”œโ”€โ”€ 16-09-25.parquet
โ”‚   ...
โ”œโ”€โ”€ 16-11-20.parquet
โ”œโ”€โ”€ 16-11-21.parquet
โ””โ”€โ”€ 16-11-22.parquet
unidirectional/
โ”œโ”€โ”€ 16-09-23.parquet
โ”œโ”€โ”€ 16-09-24.parquet
โ”œโ”€โ”€ 16-09-25.parquet
โ”‚   ...
โ”œโ”€โ”€ 16-11-20.parquet
โ”œโ”€โ”€ 16-11-21.parquet
โ””โ”€โ”€ 16-11-22.parquet

๐Ÿ“ Custom Flow Structure

Each custom flow record has the following structure:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                   Custom Flow                                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Flow Meta  โ”‚  P01_Meta  โ”‚  P02_Meta  โ”‚  ...  โ”‚  Pi_Meta  โ”‚   B bytes flow payload  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚             โ”‚                                                  โ”‚
       โ”‚             โ”‚                                                  โ”‚
       โ”‚             โ–ผ                                                  โ”‚
       โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                      โ”‚
       โ”‚    โ”‚        PACKET METADATA             โ”‚                      โ”‚
       โ”‚    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                      โ”‚
       โ”‚    โ”‚ Time Offset โ”‚ Pkt Size โ”‚ Direction โ”‚                      โ”‚
       โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ”‚
       โ”‚                                                                โ”‚
       โ”‚                                                                โ–ผ
       โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚    โ”‚                          FLOW PAYLOAD                                โ”‚
       โ”‚    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
       โ”‚    โ”‚ P_start โ”‚ P_end โ”‚ P_start โ”‚ P2_B001 โ”‚ P2_B... โ”‚ P2_Bn โ”‚ P_end โ”‚ Pad  โ”‚
       โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                FLOW METADATA                                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Relative     โ”‚  IPv4  โ”‚  Remote  โ”‚  Device  โ”‚ Protocol โ”‚  Total    โ”‚  Total         โ”‚
โ”‚ Timestamp    โ”‚        โ”‚  Port    โ”‚  Port    โ”‚          โ”‚  Bytes    โ”‚  Packets       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“‹ Data Fields

Each custom flow includes the following fields:

Flow Metadata

  • Device: Device MAC Address
  • FirstSeen: Unix timestamp of the first packet (ยตs)
  • RemIP: Remote IPv4 address
  • Proto: Transport layer protocol
  • DevPort: Device-side port number
  • RemPort: Remote-side port number
  • TotalFlowSize: Total byte count of the flow
  • PacketCount: Total packet count of the flow

Packet-Level Features (first 10 packets)

  • P00_TO - P10_TO: Time offset from flow's first-seen packet timestamp (ยตs)
  • P00_PS - P10_PS: Packet size (bytes)
  • P00_D - P10_D: Direction flag (1 = device โ†’ remote, 0 = remote โ†’ device)
  • C_000 - C_2999: Transport-layer payload bytes with delimiters, -4 for packet start and -8 for packet end

Cite our data