# Training a sub 1B LLM for local use in Home Assistant

Tags: AI NLP Home Assistant

Reading time: 4 minutes

Description: (!!Work in progres!!)






# Introduction

TODO

Home-LLM: https://github.com/acon96/home-llm/

Model used: https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct Model family description: https://huggingface.co/blog/smollm

Why such a small model: https://arxiv.org/pdf/2310.03003

# Test environment

A quick and dirty HASS Docker setup, since i dont want to copy each model iteration to the PI..

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# docker-compose.yaml
services:
  homeassistant:
    container_name: homeassistant
    image: "ghcr.io/home-assistant/home-assistant:stable"
    volumes:
      - ./hass/config:/config
      - /etc/localtime:/etc/localtime:ro
      - /run/dbus:/run/dbus:ro
    restart: unless-stopped
    privileged: true

## Deploying and installing HACS

1
2
3
docker compose up
docker exec -it homeassistant bash
wget -O - https://get.hacs.xyz | bash -

# Preparing the training environment

## Train

### Finding prefix and suffix token

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# python3 find_split.py HuggingFaceTB/SmolLM-135M-Instruct
Chat template:
----------------------------------------------------------------------------------------------------
<|im_start|>user
HA_REQUEST<|im_end|>
<|im_start|>assistant
HA_RESPONSE<|im_end|>

----------------------------------------------------------------------------------------------------
Chat template tokens:
----------------------------------------------------------------------------------------------------
[1, 9690, 198, 8232, 314, 253, 817, 6011, 2, 198, 1, 4093, 198, 81, 2914, 3116, 3935, 1535, 2, 198, 1, 520, 9531, 198, 1195, 2426, 314, 281, 1535, 2, 198]
----------------------------------------------------------------------------------------------------
Estimated tokens for HuggingFaceTB/SmolLM-135M-Instruct
response prefix:
assistant

tokens with no leading whitespace: [520, 9531, 198]
tokens with leading whitespace: [11173, 198]
tokens with leading newline: [198, 520, 9531, 198]
tokens with stripped whitespace: [520, 9531]
----------------------------------------------------------------------------------------------------
response suffix:
<|im_end|>

tokens with no leading whitespace: [2, 198]
tokens with leading whitespace: [216, 2, 198]
tokens with leading newline: [198, 2, 198]
tokens with stripped whitespace: [2]
----------------------------------------------------------------------------------------------------
'no added whitespace' found the assistant response!
        --prefix_ids 520,9531,198
        --suffix_ids 2,198
'leading space' did not find the assistant response
'leading newline' did not find the assistant response
'stripped whitespace' found the assistant response!
        --prefix_ids 520,9531
        --suffix_ids 2

### Train

## Quantize

To use the model with llama-cpp it must be converted to the gguf format.

Also quantize it for inference speed.


## Test on HASS

# Results

## Training run 01

Results: {'train_runtime': 1213.0166, 'train_samples_per_second': 35.494, 'train_steps_per_second': 1.109, 'train_loss': 2.8430937607492215, 'epoch': 1.0}


Parameter Value
learning_rate 5e-6
batch_size 32
micro_batch_size 8

Well…


an image showing the model responding with service tokens

## Training run 02

Results: {'train_runtime': 1207.0863, 'train_samples_per_second': 35.669, 'train_steps_per_second': 1.114, 'train_loss': 0.08614175222177045, 'epoch': 1.0}


Trying the parameters the model was trained with and quantizing to Q8_0.


According to the HuggingFace Page:


Parameter Value
learning_rate 1e-3
epochs 1
warmup_ratio 0.1
batch_size 32
micro_batch_size 8

At least the output format is going in the right direction…

an image showing the model hallucinating text and the item name

## Training run 03

Results: {'train_runtime': 1197.7346, 'train_samples_per_second': 35.947, 'train_steps_per_second': 0.144, 'train_loss': 0.22953853764852813, 'epoch': 1.0}


Increasing the batch size to 256, micro batch size to 10 and still using Q8_0.


Parameter Value
learning_rate 1e-3
epochs 1
warmup_ratio 0.1
batch_size 256
micro_batch_size 10

Still hallucinating heavily but at least using the right item (well i only altered the batch size so cant expect much to change).


an image showing the model hallucinating text and the item name


to be continued…