Friday, August 29, 2025

ChatGPT

I made a ChatGPT where you can give instructions into ChatGPT and it will follow.


I pretrained ChatGPT with data to generate the next token.

Then I finetuned ChatGPT to follow instructions.

I used Alpaca prompt style:

These are the answers given by ChatGPT:

These are the scores for testing my chatgpt with Llama 3:



Friday, August 22, 2025

Masked Multihead Attention

I made a masked Multihead Attention mechanism for LLMs.

This is the code:


They are combined the context vectors in the last dimension each learning something different from the input x.





Wednesday, August 6, 2025

Register Coalescing

I added a Register Allocator to my compiler for optimizing code:


The coalescing happens in a loop making changes to the interference graph directly which are not good,
and complete interference graph reconstruction on the outer loop which happens less frequently but which is the best.


Using this map we can look at the right and middle columns in the next image to see which operands are coalesced.


On the right is the original instructions in the middle is register coalescing which removes several instructions, and on the left is the rest of the register allocator with graph coloring.

Graph Coloring:


After coalescing graph coloring happens, at this stage some nodes are removed by coalescing. So it is faster:


Finally you can look at the graph coloring on the left column:

If you want to look into the code:




Thursday, July 11, 2024

MARL Multi Agent Reinforcement Learning

I made MARL (Multi Agent Reinforcement Learning) with DDPG (Deep Deterministic Policy Gradient). 

I made the DDPG with python and Pytorch. I used Unity Machine Learning Agents for the environment. 

This model can be used for robots that need to learn collaboration with continuous action spaces.

Two agents need to learn to interact with each other to play Tennis. They have to pass the ball over the Net and work collaboratively.


I made one DDPG with two Noises.

I solved the environment. This is the learning by episode Score Table:




If you want to look at the code:



Wednesday, July 10, 2024

DQN Deep Q Network for discrete Action Spaces

I trained a DQN (Deep Q Network) to collect yellow bananas and to avoid collecting blue bananas.

I made the DQN in python with pytorch. I used Unity Machine Learning Agents for the environment.



In Reinforcement Learning the Agent needs to interact with the environment to learn, in this case the Agent receives a Reward of +1 for collecting a Yellow Banana, and a reward of -1 for collecting a blue banana.

I solved the environment:



If you want to look at the code: