🔧

Deel 2: Finetuning - Praktische stap voor stap guide voor beginners

📖

<Terug naar overzicht Artikelen:

📚 Deel 1: Wat is finetuning?

🔧 Deel 2: Praktische stap voor stap guide

Achtergrond:

🤖 LLM’s lokaal draaien - 2 simpele methodes

🛣️ LLM’s lokaal draaien - De weg naar ondevice LLM’s

💻 LLM’s lokaal draaien - Je systeem in relatie tot de modelspecificaties

⚖️ Wat is lora?

Inhoudsopgave

Inhoudsopgave
Inleiding
Een paar dingen vooraf…
Code
Wat gaan we maken?
uitgangspunten
Gekozen scenario
Stap 1
JSONL
Dataverzameling
Converteren naar de juiste trainingsdatastructuur
Stap 2
Trainingsdata verdelen
MLX
Model toegang aanvragen
Model downloaden
Huggingface downloader utility
Finetuning

Inleiding

In deze blog gaan we stap voor stap bouwen aan een eigen, gefinetunede versie van een Opensource LLM. We doorlopen dat proces van A tot Z. van het idee, tot de dataverzameling, tot en met de training en de ingebruikname. We maken daarbij zoveel mogelijk gebruik van opensource faciliteiten. Voor deze walkthrough hoef je zeker geen ervaren progammeur of machine learningn expert te zijn. Enige kennis van LLM’s is wel handig, maar daarvoor kun je deel 1 lezen. Verder zul je zien dat het finetunen van een model simpeler is dan je zou denken.

ℹ️

Een paar dingen vooraf…

Deze blog is een praktijk uitwerking van het proces waarbij we een opensource LLM gaan finetunen voor een specifieke door ons bedachte taak. Een uitgebreid theoretisch kader is hier terug te lezen.
Deze guide is geschreven voor Apple Macbooks met een silicon processor (M1, M2 of M3), en waarschijnlijk niet geschikt voor gebruikers met een ander type chip. Gelukkig zijn voor die gebruikers voldoende andere guides beschikbaar.
Dit is de eerste keer dat ik dit doe, er zullen ongetwijfeld dingen zijn die handiger aan te pakken zijn dan hoe ik ze hier uitvoer. Mocht dat zo zijn hoor ik het super graag.

🐍

Code

Een repository met alle gebruikte code is hier te vinden:

🔗GitHub

kloon de repo:

	git clone https://github.com/jwa91/twitter_finetune.git

maak een virtuele omgeving en activeer hem:

cd twitter_finetune
python -m venv venv
source venv/bin/activate

Installeer de repo als python package

pip install -e .

Wat gaan we maken?

uitgangspunten

Voor onze usecase wil ik:

-een scenario kiezen waarbij finetuning ook daadwerkelijk een betere keus is dan deze methodieken.

Zoals ik in deel 1 al besprak is finetuning niet voor alle usecases voor optimalisatie van LLM output de beste oplossing. Een usecase met realtime data, of met ongestructureerde data die nog wel eens kan veranderen, zoals een pdf van de productcatalogus, is meer gebaat bij andere technieken (🔗).

-een scenario kiezen waarbij het relatief eenvoudig was om de data te verzamelen en om te zetten naar een trainings dataset.

Gekozen scenario

🤖

Een AI systeem dat Tweets genereert in een specifieke stijl.

‣

Waarom?

Stap 1

..oke, laten we starten met bouwen!🚀

We beginnen met het creëeren van een dataset waar we de LLM op willen trainen Voor het finetunen van een AI model voor een specifiek scenario hebben we trainingsdata nodig, waarin we aan het model voordoen hoe het dient te reageren. Dit zal vaak zo zijn in het geval van Finetuning. We leren het model geen nieuwe feiten, maar meer een manier waarop het model dient te antwoorden; een tone of voice.

OpenAI geeft de volgende 2 voorbeelden:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Dataformaat nieuwere GPT modellen

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Dataformaat oudere modellen van OpenAI

Redelijk recht toe rechtaan. Hoewel de exacte formatting wat verschilt is de overeenstemming dat we in beide gevallen op basis van bepaalde input, bepaalde output willen krijgen.

ℹ️

JSONL

Wanneer je vaker met json data werkt zal je misschien opvallen dat de gegeven voorbeelden geen geldige json indeling hebben; dan zou de data er ongeveer zo uit gezien moeten hebben:

[
  {
    "prompt": "<prompt text>",
    "completion": "<ideal generated text>"
  },
  {
    "prompt": "<prompt text>",
    "completion": "<ideal generated text>"
  },
  {
    "prompt": "<prompt text>",
    "completion": "<ideal generated text>"
  }
]

trainingssets voor LLMs zijn bijna altijd opgebouwd als jsonl bestanden. Een voor mij onbekend dataformat. JSONL (JSON Lines) is een tekstformaat waarbij elke regel een geldig JSON-object bevat. Het voordeel hiervan bij het trainen van LLM’s is dat het JSONL bestand regel voor regel verwerkt kan worden. Bij een reguliere JSON, zou de hele array in principe in het geheugen geladen moeten worden, wat naarmate de bestanden groeien problemen kan geven. Meer info hier.

Dataverzameling

Goed. We weten dus dat we in het geval van een LLM die Tweets maakt in een specifieke stijl in ieder geval een aantal voorbeeld Tweets in die specifieke stijl nodig hebben. Laten we die eerst gaan verzamelen.

Account kiezen

Je kunt hiervoor elk Twitter account kiezen dat je wil en in eerste instantie leek het me leuk om het voorbeeld met mijn eigen Tweets te doen, maar ik bedacht me dat die qua stijl en interesse waarschijnlijk niet super onderscheidend zijn, om het voorbeeld duidelijker te laten zijn heb ik daarom gekozen voor het account @Optajohan. Dit is een account over sport statistieken die in een hele kenmerkende stijlfiguur Twitteren:

Antwoord

In het meest simpele voorbeeld van OpenAI, hebben we dus onze eerste variabele:

{"prompt": "<prompt text>", "completion": "6 - Sem Steijn is the third player with at least six goals in his first six Eredivisie games of the season for FC Twente in the last 50 years, after Marc Janko in 2011-12 (7) and Enes Ünal in 2016-17 (6). Unleashed."}

gedeeltelijke trainingrow

Vraag

Nu we weten wat het antwoord van de LLM gaat zijn, moeten we nadenken wat de vraag van ons gaat zijn om dat antwoord te krijgen.. Dat zou iets kunnen zijn als dit:

👤

"Write an Opta-style tweet about the following: Sem Steijn has scored at least six goals in his first six Eredivisie games of the season for FC Twente, becoming the third player to achieve this feat in the last 50 years, after Marc Janko in 2011–12 (7 goals) and Enes Ünal in 2016–17 (6 goals)."

met andere woorden:

{"prompt": "Write an Opta-style tweet about the following: Sem Steijn has scored at least six goals in his first six Eredivisie games of the season for FC Twente, becoming the third player to achieve this feat in the last 50 years, after Marc Janko in 2011–12 (7 goals) and Enes Ünal in 2016–17 (6 goals).", "completion": "6 - Sem Steijn is the third player with at least six goals in his first six Eredivisie games of the season for FC Twente in the last 50 years, after Marc Janko in 2011-12 (7) and Enes Ünal in 2016-17 (6). Unleashed."}

..waarbij het eerste gedeelte van dat prompt (”Write an Opta-style tweet about the following: “) steeds hetzelfde kan blijven.

Omdat we op dit moment nog niet weten welk finetuning model we straks willen gebruiken, of misschien willen we de finetuning actie wel bij meerdere modellen doorvoeren, en omdat de jsonl structuur van al deze modellen anders is, heb ik voor het voorbeeld van OptaJohan een csv bestand voorbereid:

Query	Answer
The Netherlands national team had a particularly unproductive first half in their recent match. For the first time since June 2019, when they faced Portugal, they failed to attempt a single shot in the first 45 minutes. Even more notably, they only managed one touch in the opposition's box before halftime, a feat of ineffectiveness not seen since their match against France in August 2017.	0 – The Netherlands have failed to attempt a single shot in a first half of a game for the first time since June 2019 (v Portugal) and recorded just one touch in the opposition box before half-time for the first time since August 2017 (v France). Harmless.
Oliver Baumann has set a new record in German football history. At the ripe age of 34 years and 131 days, he became the oldest goalkeeper ever to make his debut for the German national team (DFB). This late bloom isn't just notable for goalkeepers - Baumann is the second-oldest DFB debutant in the post-war era, regardless of position. Only Matthias Mauritz was older when he first donned the national team jersey.	34y, 131d - Aged 34 years and 131 days, Oliver Baumann is the second-oldest DFB debutant (after Matthias Mauritz) in the post-war era and the oldest goalkeeper to make his debut in DFB history. Latecomer.
The Dutch national team (@OnsOranje), fielded an exceptionally youthful starting eleven in their recent match. With an average age of just 24 years and 100 days, this was their youngest starting lineup since a friendly against Morocco in May 2017, when the average age was even lower at 23 years and 324 days. More impressively, it's the youngest team the Netherlands has fielded in a competitive match since September 2012, when they played Hungary with a team averaging 24 years and 59 days old.	24 - With an average age of 24 years and 100 days, @OnsOranje have named their youngest starting XI since May 2017 (23y, 324d v Morocco) and the youngest in a competitive match since September 2012 (24y, 59d v Hungary). Verve.

🔗

De hele trainingsdata set is hier beschikbaar.

Converteren naar de juiste trainingsdatastructuur

De volgende stap is dat we deze dataset moeten converteren naar de juiste structuur jsonl, en die structuur is dus afhankelijk van de modelkeuze. Om de data zoals we die hebben te converteren naar de juiste trainingsstructuur, moeten we informatie uit de model template halen. Deze kun je bijvoorbeeld op Ollama zien, je moet daarvoor het template bestandje aanklikken:

Het Template bestandje is in Ollama te vinden

Laten we 2 voorbeelden bekijken

Llama 3.2

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}{{ .System }}
{{- end }}
{{- if .Tools }}When you receive a tool call response, use the output to format an answer to the orginal user question.

You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ range $.Tools }}
{{- . }}
{{ end }}
{{ .Content }}<|eot_id|>
{{- else }}

{{ .Content }}<|eot_id|>
{{- end }}{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}
{{ range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}
{{- end }}{{ if not $last }}<|eot_id|>{{ end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}

Misschien nog wat onduidelijk, laten we wat hier staat wat proberen op te helderen door bij elke conditiie of range in te springen:

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}
    {{ .System }}
{{- end }}

{{- if .Tools }}
    When you receive a tool call response, use the output to format an answer to the original user question.

    You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>

{{- range $i, $_ := .Messages }}
    {{- $last := eq (len (slice $.Messages $i)) 1 }}
    {{- if eq .Role "user" }}
        <|start_header_id|>user<|end_header_id|>
        {{- if and $.Tools $last }}

            Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

            Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

            {{ range $.Tools }}
                {{- . }}
            {{ end }}
            {{ .Content }}<|eot_id|>
        {{- else }}
            {{ .Content }}<|eot_id|>
        {{- end }}
        
        {{ if $last }}
            <|start_header_id|>assistant<|end_header_id|>
        {{ end }}

    {{- else if eq .Role "assistant" }}
        <|start_header_id|>assistant<|end_header_id|>
        {{- if .ToolCalls }}
            {{ range .ToolCalls }}
                {"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}
            {{ end }}
        {{- else }}
            {{ .Content }}
        {{- end }}
        
        {{ if not $last }}
            <|eot_id|>
        {{ end }}

    {{- else if eq .Role "tool" }}
        <|start_header_id|>ipython<|end_header_id|>

        {{ .Content }}<|eot_id|>

        {{ if $last }}
            <|start_header_id|>assistant<|end_header_id|>
        {{ end }}
    {{- end }}
{{- end }}

Nu kunnen we een aantal onderdelen identificeren die we weg kunnen halen, omdat we ze niet nodig hebben, bijvoorbeeld toolcalls:

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}
    {{ .System }}
{{- end }}

{{- if .Tools }}
    When you receive a tool call response, use the output to format an answer to the original user question.

    You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>

{{- range $i, $_ := .Messages }}
    {{- $last := eq (len (slice $.Messages $i)) 1 }}
    {{- if eq .Role "user" }}
        <|start_header_id|>user<|end_header_id|>
        {{- if and $.Tools $last }}

            Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

            Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

            {{ range $.Tools }}
                {{- . }}
            {{ end }}
            {{ .Content }}<|eot_id|>
        {{- else }}
            {{ .Content }}<|eot_id|>
        {{- end }}
        
        {{ if $last }}
            <|start_header_id|>assistant<|end_header_id|>
        {{ end }}

    {{- else if eq .Role "assistant" }}
        <|start_header_id|>assistant<|end_header_id|>
        {{- if .ToolCalls }}
            {{ range .ToolCalls }}
                {"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}
            {{ end }}
        {{- else }}
            {{ .Content }}
        {{- end }}
        
        {{ if not $last }}
            <|eot_id|>
        {{ end }}

    {{- else if eq .Role "tool" }}
        <|start_header_id|>ipython<|end_header_id|>

        {{ .Content }}<|eot_id|>

        {{ if $last }}
            <|start_header_id|>assistant<|end_header_id|>
        {{ end }}
    {{- end }}
{{- end }}

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}
    {{ .System }}
{{- end }}

<|eot_id|>

{{- range $i, $_ := .Messages }}
    {{- $last := eq (len (slice $.Messages $i)) 1 }}
    {{- if eq .Role "user" }}
        <|start_header_id|>user<|end_header_id|>
            {{ .Content }}<|eot_id|>
        {{- end }}
        
        {{ if $last }}
            <|start_header_id|>assistant<|end_header_id|>
        {{ end }}

    {{- else if eq .Role "assistant" }}
        <|start_header_id|>assistant<|end_header_id|>
        {{- else }}
            {{ .Content }}
        {{- end }}
        
        {{ if not $last }}
            <|eot_id|>
        {{ end }}
{{- end }}

Een tweede versimpeling die we nu kunnen uitvoeren is dat we de logica voor een range van berichten kunnen weghalen, we hebben in onze trainingsset steeds maar 1 user- en assistant message:

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}
    {{ .System }}
{{- end }}

<|eot_id|>

{{- range $i, $_ := .Messages }}
    {{- $last := eq (len (slice $.Messages $i)) 1 }}
    {{- if eq .Role "user" }}
        <|start_header_id|>user<|end_header_id|>
            {{ .Content }}<|eot_id|>
        {{- end }}
        
        {{ if $last }}
            <|start_header_id|>assistant<|end_header_id|>
        {{ end }}

    {{- else if eq .Role "assistant" }}
        <|start_header_id|>assistant<|end_header_id|>
        {{- else }}
            {{ .Content }}
        {{- end }}
        
        {{ if not $last }}
            <|eot_id|>
        {{ end }}
	{{- end }}

Wat ons het volgende eindresultaat template geeft:

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023 
{{ .System }}<|eot_id|>

<|start_header_id|>user<|end_header_id|>

{{ .Content }}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

{{ .Content }}<|eot_id|>

In gevuld met onze data:

<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023<|eot_id|>

<|start_header_id|>user<|end_header_id|>

Write an Opta-style tweet about the following: The Netherlands national team had a particularly unproductive first half in their recent match. For the  first time since June 2019, when they faced Portugal, they failed to attempt a single shot in the first 45 minutes. Even more notably, they only managed one touch in the opposition's box before halftime, a feat of ineffectiveness not seen since their match against France in August 2017.<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

0 – The Netherlands have failed to attempt a single shot in a first half of a game for the first time since June 2019 (v Portugal) and recorded just one touch in the opposition box before half-time for the first time since August 2017 (v France). Harmless.<|eot_id|>

Mistral 7B

{{- if .Messages }}
{{- range $index, $_ := .Messages }}
{{- if eq .Role "user" }}
{{- if and (eq (len (slice $.Messages $index)) 1) $.Tools }}[AVAILABLE_TOOLS] {{ $.Tools }}[/AVAILABLE_TOOLS]
{{- end }}[INST] {{ if and $.System (eq (len (slice $.Messages $index)) 1) }}{{ $.System }}

{{ end }}{{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{- if .Content }} {{ .Content }}
{{- else if .ToolCalls }}[TOOL_CALLS] [
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}]
{{- end }}</s>
{{- else if eq .Role "tool" }}[TOOL_RESULTS] {"content": {{ .Content }}} [/TOOL_RESULTS]
{{- end }}
{{- end }}
{{- else }}[INST] {{ if .System }}{{ .System }}

{{ end }}{{ .Prompt }}[/INST]
{{- end }} {{ .Response }}
{{- if .Response }}</s>
{{- end }}


[INST] {{ .System }}{{ .Prompt }}[/INST]{{ .Response }}

[INST] Write an Opta-style tweet about the following: The Netherlands national team had a particularly unproductive first half in their recent match. For the  first time since June 2019, when they faced Portugal, they failed to attempt a single shot in the first 45 minutes. Even more notably, they only managed one touch in the opposition's box before halftime, a feat of ineffectiveness not seen since their match against France in August 2017.[/INST]0 – The Netherlands have failed to attempt a single shot in a first half of a game for the first time since June 2019 (v Portugal) and recorded just one touch in the opposition box before half-time for the first time since August 2017 (v France). Harmless.

Laatste stap:

Nu moeten we deze prompt templates nog in een jsonl object converteren. Dat is vrij simpel, het jsonl format bestaat uit een enkele key en value. De key is “text” en de value is het prompt en het antwoord.

{"text": "prompt + antwoord"}
{"text": "prompt + antwoord"}
{"text": "prompt + antwoord"}
// etc.

Llama 3.2 wordt dus:

{"text": "<|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Write an Opta-style tweet about the following:<|eot_id|> <|start_header_id|>user<|end_header_id|> The Netherlands national team had a particularly unproductive first half in their recent match. For the first time since June 2019, when they faced Portugal, they failed to attempt a single shot in the first 45 minutes. Even more notably, they only managed one touch in the opposition's box before halftime, a feat of ineffectiveness not seen since their match against France in August 2017.<|eot_id|> <|start_header_id|>assistant<|end_header_id|> 0 – The Netherlands have failed to attempt a single shot in a first half of a game for the first time since June 2019 (v Portugal) and recorded just one touch in the opposition box before half-time for the first time since August 2017 (v France). Harmless.<|eot_id|>"}

En Mistral:

{"text": "[INST] Write an Opta-style tweet about the following: The Netherlands national team had a particularly unproductive first half in their recent match. For the  first time since June 2019, when they faced Portugal, they failed to attempt a single shot in the first 45 minutes. Even more notably, they only managed one touch in the opposition's box before halftime, a feat of ineffectiveness not seen since their match against France in August 2017.[/INST]0 – The Netherlands have failed to attempt a single shot in a first half of a game for the first time since June 2019 (v Portugal) and recorded just one touch in the opposition box before half-time for the first time since August 2017 (v France). Harmless."}

💻

Ik heb op Github een python scriptje geplaatst waarmee je de .csv converteert naar ofwel de Llama indeling, ofwel de Mistral indeling: 🔗 Klik hier voor het scriptje ℹ️ Indien je de Github Repo hebt geïnstalleerd als package kun je het script ook gebruiken door create_jsonl in te voeren in je terminal, gevolgd door het gewenste format (llama of mistral)

Stap 2

Trainingsdata verdelen

Nu we een dataset met voorbeelden hebben van vragen aan de llm en antwoorden die we dan willen krijgen kunnen we beginnen met het daadwerkelijke finetunen van een model.

📖

Ik ga er hier even van uit dat je inmiddels weet hoe je een model lokaal kunt downloaden en gebruiken, maar mocht dat niet zo zijn dan zijn deze achtergrondartikelen relevant.

Waar moet mijn systeem aan voldoen? Klik
2 simpele instructies Klik

Om te trainen hebben we eigenlijk 3 datasets nodig. Het grootste gedeelte van onze testvoorbeelden gaat naar de trainingsset, maar een klein deel gaat naar de validatie en test trainingsset. Dit is een goede verdeelsleutel:

60% - Training
20% - Test
20% - Validatie

💻

Ik heb op Github een python scriptje geplaatst waarmee je de jsonl bestanden in de data map converteert naar 3 kleinere databestanden in een map vernoemd naar de originele jsonl. 🔗 Klik hier voor het scriptje ℹ️ Indien je de Github Repo hebt geïnstalleerd als package kun je het script ook gebruiken door split_jsonl in te voeren in je terminal.

MLX

Voor het daadwerkelijke trainen gaan we MLX gebruiken. Voor het trainen van LLM’s lokaal zijn veel verschillende opties beschikbaar, de bekendste zijn PyTorch en Tensorflow. MLX is ten opzichte van die platformen wat meer low level, met wat minder out of the box utility’s. Het voordeel van MLX is echter dat het gebruik maakt van de unified memory architectuur die Apple Sillicon chips hebben. Dit maakt het mogelijk om sneller te trainen en fine-tunen, omdat gegevens snel kunnen worden uitgewisseld tussen de CPU en GPU zonder kopiëren.

Specifiek gebruiken we de mlx-lm package, deze is simpel te installeren via een pip install:

pip install mlx-lm

💻

Indien je de github repo hebt geinstalleerd als package met pip install -e . is mlx-lm al mee geïnstalleerd (zie pyproject.toml)

Model toegang aanvragen

Nu we MLX-LM hebben moeten we het originele model downloaden van HuggingFace. Het kan zijn dat de maker van het model aan het downloaden ervan wat voorwaarden verbindt, bij Mistral moet je een formuliertje invullen dat je het model niet gebruikt met verkeerde intenties. Bij Meta moet je voor de Llama modellen zelfs toegang vragen, maar dat had ik vrij snel:

Je kunt de status van je aanvragen in je account zien

🇪🇺

Sidenote: Wat jammer is, hoewel voor persoonlijk gebruik op apparaten met weinig RAM of GPU’s niet heel relevant, is dat een aantal modellen wanneer ze groeien wat het aantal parameters betreft niet meer beschikbaar zijn in de EU, vanwege de strengere regelgeving hier.

Model downloaden

Het makkelijkst voor ons proces is om de originele safetensor files te downloaden, en geen “quantized” bestanden (🔗lees het achtergrond artikeltje “De weg naar OnDevice AI systemen voor meer info over die termen, mocht je ze niet kennen).

Om dit te doen hebben we de huggingface-cli nodig:

pip install -U "huggingface_hub[cli]"

💻

Indien je de github repo hebt geinstalleerd als package met pip install -e . is mlx-lm al mee geïnstalleerd (zie pyproject.toml)

Na de installatie van de huggingface cli moeten we inloggen bij huggingface:

huggingface-cli login

Wanneer je dit commando uitvoert word je gevraagd om een token, deze vind je of kun je aanmaken bij de settings in je Huggingface account(🔗). Kopieer de token vanuit daar naar je terminal, sla hem ook gelijk op in je .env bestand.

als je succesvol bent ingelogd zie je dit.

Huggingface downloader utility

Het is makkelijk om nu ook de Huggingface downloader utility te installeren

bash <(curl -sSL https://g.bodaay.io/hfd) -i

Je wordt hierbij gevraagd om je wachtwoord op te geven, deze verschijnt niet terwijl je typt.

Na installatie kun je het model zelf gaan downloaden, je doet dit door het volgende commando in de terminal uit te voeren:

hfdownloader -s . -m meta-llama/Llama-3.2-3B-Instruct -t hf_*************

vervang de token hf_**** door de token die in je .env bestand staat.

de naam en de versie van het model vind je in huggingface hier:

Het commando zal de download van het model starten:

Finetuning

We kunnen nu dus het finetuning commando zelf geven, die bestaat uit een aantal onderdelen.

mlx_lm.lora - We geven aan dat we via de lora methode willen finetunen om rekenkracht te besparen. Lees hier meer over Lora.
--train - we willen het model trainen.
--model [pad/naar/modelmap] - De map waar we met hfdownloader het model hebben opgeslagen.
--data [pad/naar/jsonls] - De map waarin we onze data hebben opgeslagen
--batch-size [aantal] - hoeveel voorbeelden het model per trainingsiteratie moet verwerken. Meer = beter en sneller maar vraagt meer memory, dus zet dit op de max wat je computer aan kan.

Alles samen ziet dat er dan dus ongeveer zo uit:

mlx_lm.lora --train \
--model meta-llama_Llama-3.2-3B-Instruct \
--data data/tweets-llama --batch-size 5

⚠️

Let op! Dit commando kost veel compute. Mijn M1 met 16GB RAM deed over bovenstaand commando ongeveer anderhalf uur ⌛

..en dan is het wachten. ☕⏱️

Wanneer je computer klaar is zie je dat er een nieuwe map is gecreëerd genaamd adapters:

mijn project map ziet er na het finetunen ongeveer zo uit, met een map waarin het model staat, een map waarin de data staat en een map waarin de adapterfiles staan.

Met deze adapters kunnen we een variant van het originele model maken. We doen dit door een nieuw bestand te maken: Modelfile. We vullen dit bestand met de volgende regels:

FROM ./meta-llama_Llama-3.2-3B-Instruct
ADAPTER ./adapters

Daarna openen we de terminal en geven we het volgende commando:

ollama create twitllama -f Modelfile

En dat is het! Je hebt zojuist je eerste model gefinetuned Gefeliciteerd! 🎉

Met het commando ollama list kun je zien welke modellen je allemaal lokaal geinstalleerd hebt. Het nieuw gecreerde model zal hier ook tussenstaan:

met het commando ollama run kunnen we het model testen, in ons geval ollama run twitllama .