[UPDATE 💡] I moved to Mountain View, California in December 24, concluding my chapter in Tokyo!
Also, I joined Google Labs to make a new AI product for Google on July 25, leaving Gemini App team.
Hi, I'm Jiho. I am a product developer and researcher, building products with machine learning and natural language processing (NLP).
I like to write about machine learning, NLP, AI and product development. I also write in Korean.
Currently, I am working on a new AI product that aims to innovate how people work.
I've been at Google since 2020, starting from Japan, working on these products:
Google IO 2023 was one of the most memorable event, because our CEO Sundar presented a piece of code that I reviewed.
My code being presented at Google IO 2023.
PaLM, the state-of-the-art language model at that time, was featured to not only fix your bug, but also put code comments in Korean!
A side story for this 15 sec presentation is that I was pulled into a group chat of several Korean Googlers to come up with the best example of LLM fixing a bug. I added this DFS example, because this was something I frequently get wrong.
I was delighted that PaLM can fix this, but was even more happy when Sundar presented this code live to the world!
(I joked that this would probably the most influential code snippet that I would ever write or review in my Google career..)
Anyways, this is just over 2 years ago. If I think about how I use coding agents in 2025, the advancement is mind-blowingly fast. People say that software engineering as a career is in track for deprecation. If vibe coding can build softwares, then why do we need expensive engineers?
Before we get to that question, I want to share how I use AI coding tools. I know that there might be a lot of folks who use AI tools better than me, but I want to record my current experience for my own future entertainment.
Google’s Developer Ecosystem / Culture
Google definitely has a unique developer ecosystem and culture, so unique that a software engineer working for Google has a risk of being stuck in it.
But in terms of AI coding assistant, I would say outside world has moved much faster. Github Copilot, Cursor, Windsurf, Claude Code, and so many applications have came up over the last 1-2 years.
Since Google has a lot of software engineers, there is huge incentive to make our tools better. Also, we have Gemini, especially 2.5 showing huge improvements in coding capabilities.
By now, our internal AI coding tools have catched up a lot, and engineers adopted fast. We also have released them to the public, so I can introduce some workflows that I use frequently:
gemini-cli - a terminal based coding assistant. Similar to Claude Code.
Gemini Code Assist - we don’t use exactly this, but similar features are integrated to our internal IDE, Cider and code review tool, Critique.
Most Productive Workflows (I find)
1. Code Understanding
When I joined the new team in July, I had to onboard to a new codebase as soon as possible. But the challenge was that I have never written a Java backend nor Angular.js frontend code! Since I mostly worked on data, I was more familiar with C++ or Python.
Obviously, my team mates are the most helpful for onboarding, but I did not want to bother them for every details of the code. So I decided to leverage Gemini CLI.
“(Hey Gemini) Explain the workflow of this function. Draw a diagram.”
“Which file does this link to? How is it wired up this way?”
It is like having a mentor with unlimited patience and bandwidth right next to you!
I drew how some parts of the code worked on a whiteboard with the help of Gemini to get the whole picture and the details. This really boosted my onboarding. I even could give a presentation to my colleagues on what the status quo of some component is and suggest improvements.
2. Refactoring
Refactoring is often neglected, because it is tedious structural changes of the code - Moving function parameters around, breaking down functions, renaming things - while not breaking the compiler or the original feature.
Good news is that LLMs never get bored and is amazing at syntax. Also, it can go through the whole code or even codebase and make changes that are needed if I change one little thing. Tedious edits like imports, header files, etc.
A new feature development is some times blocked by refactoring. Now, small refactorings would not block me, because the time I spend on it reduced to 1/10.
3. Unit Test Generation
Important point to make your codebase easier to refactor is having good unit tests. Any basic unit test is significantly better than nothing.
However, some times unit tests are omitted due to the tedious syntax of mocking and test case writing. Again, hard-working LLMs are here to help!
Obviously, I do not ask it to write unit test without knowing what I want to cover, but it helps a lot when I don’t have different mocking library patterns on top of my head.
4. Writing new code
With LLMs, my way of thinking elevated to focus on the logic, rather than specific langauge’s syntax.
I have my basics on OOP and async operations in my head, but just not in Java, which I haven’t used in more than 15 years. It would have taken me 2-5x amount of time if I had to look through Java documentations to find the equivalent of something in C++.
Now I will prompt the LLM to implement the logic I want. Sometimes, I even express myself in Python with a simpler syntax and ask it to implement the same logic in Java.
Another good thing is that the coding assistant will take the whole file as a context, so it follows the team’s previous patterns and style, whenever reasonable.
I also find this super useful for frontend development. I recently picked up Angular.js, which sometimes involve modifying muliple files (javascript, css, html) at once. LLM is excellent for doing this.
5. Self-code review
Due to my personality, I tend to spend quite a lot of time to review my own code before sending it out to a colleague. Now, there is a feature to auto review a changelist. This has been great to reduce both my and the reviewer’s time, especially I am prone to make basic mistake on a language that I’m not familiar with.
6. Error Message Parsing
Error messages can be brutally long and cryptic, but I now ask LLM to explain them to me and even suggest a fix! This really is a game changer to reduce time debugging.
7. Readability Suggestion
Google has a particular culture of readability reviews. Pros of this culture is that the overall code quality is higher, with the cost of slowing down. Especially for new product development, velocity is super important, so I felt relunctant to put any readability related comment on the code.
However, now LLM can read my nitty comment and attach a fix with it. The author can select to apply the fix with one button press. This is really effective to keep the code quality high, while maintaining velocity!
8. AI Powered Colab
Colab (Jupyter notebook) is a powerful tool for quick data analysis or data pipeline prototyping. Most colabs do not need to be elegent or well-tested as production code.
This means that there is more room for “vibe” coding. Colab recently started adding AI coding assistant inside the UI!
9. Vibe Coding Prototypes
Most of the time, demoing is better than explaining in words. Vibe coding prototypes when brainstorming new ideas on a feature is really powerful for this reason.
I can imagine or already experienced how vibe coding can be so effective for non-technical roles like product managers, business devs, and UI/UX designers to build ideas without consuming engineering team’s resource.
Vibe coding is at the level to replace engineering teams, but I would not say too early given the pace of speed this space are developing.
I often use Gemini Canvas from Gemini App for this purpose, but there are about thousands of tools to vibe code something.
Closing
There has been studies that using AI tools actually make developers slower., but people are always talking about how crazy productive these AI coding tools are. Who is correct?
While I am not the most expert in this field, I can totally see that when not used properly these tools can slow you down. I also experienced some trial and errors.
There are some principles and tips I have on how to effectively use these tools, but that will be for my next post.
One of the books that deeply influenced me is Antifragile by Nassim Taleb.
Taleb talks a lot about optionality. My interpretation is that optionality in life means putting myself in situations where I have unlimited upside and limited downside.
Limiting my downside has been relatively easy: follow obvious rules like “Don’t put all your eggs in one basket,” “Avoid life-threatening activities,” and “Don’t take leveraged bets that can knock you out of the game.” In short, don’t bite off more than you can chew.
On the other hand, creating unlimited upside has been about thinking strategically and making hard choices. Whenever I feel like my upside has limits, I seek change.
For example, when I was a computer science undergrad, I got involved in many different startups. I learned many skills, but most importantly, I learned that merely being good at programming had a limit. Fortunately, my best friend, whom I met playing football, introduced me to a natural language processing (NLP) lab. I never liked university, but I decided to stay and do research as a master’s student. During these two years, I dove deep into machine learning, deep learning, and NLP. I took baby steps to learn about the field and do research, eventually publishing my work at NLP conferences.
The timing could not have been better. I remember seeing AlphaGo defeat Lee Sedol in my first year of graduate school. Those two years shattered the ceiling that had limited me as a computer programmer.
After postgrad in 2019, I joined an early-stage startup in Hong Kong. After a year, it was evident that things were not going well. Also, the city I’d lived in for seven years was in political turmoil. I needed a change again. I started looking into leaving Hong Kong and joining a bigger company. That led me to interview at the biggest company I could possibly join—Google. My first position was as a Computational Linguist. I never passed the initial screening for software engineer roles, probably because I was just one of thousands of CS graduates. My two years in NLP research made it possible to get a foot in the door of big tech.
One challenge was that the job was in Japan. I had never thought about living in Japan and couldn’t even read hiragana. The job didn’t require Japanese, so I took a leap of faith and moved.
This difficult choice led to the most interesting five years of my life, despite the COVID-19 pandemic. We became a family there—going in as one and coming out as three. I can now converse in Japanese (somewhat). Five years in Tokyo still feels like a long honeymoon.
Despite the great food and lifestyle of Tokyo, I once again had to seek change for the same reason—limited upside. I felt I would need to dive deep into Japanese society to find more opportunities, or else remain confined to one company.
An opportunity to immigrate to California came, so we took it as a family. Now we are in Silicon Valley. As an engineer working on NLP and AI in 2025, I figured this is the best way to put myself in the right place and environment.
Another important factor that influenced our choice was “belonging.” As an international family with a multicultural child, this is an important issue. Unfortunately, our hometowns (Hong Kong and Seoul) and our last home (Tokyo) were suboptimal. After several months in California, we feel we are in the right environment.
Every choice comes with trade-offs; none is perfect. We miss the relatively high-quality, comfortable, affordable, and train-centric life in Tokyo. Our extended families are farther away now, and we need to build our social circles again. However, I believe that hard choices that create unlimited upside eventually make our lives easier in the long run. Let’s see how life turns out.
Large Language Models (LLMs) have taken over the world in recent months. New technological advances are happening really fast.
New models are being released every week, and it’s hard to keep up. To be honest, I stopped following the research after a few “groundbreaking” models, but I have been still thinking about the implications of LLMs for software engineering.
I had the chance to work on one of the biggest LLM projects in the world, which gave me some initial insights. I’ve also been reading and experimenting with different engineering practices for LLMs.
I’ll share my thoughts on engineering LLMs. This field is changing rapidly, so I’ll be sure to emphasize that these are just my initial thoughts.
Disclaimer: The thoughts in this post are my own and do not represent my employer. I am not sharing any information that is not publicly known.
What are LLMs?
Let’s go back to the basics.
LLMs, or large language models, can predict the next word in a sequence. They are trained on massive datasets of text and code, and can learn to perform a variety of tasks, such as generating text, translating languages, and answering questions. That’s why some people call them, “fancy autocomplete”.
But recent models proved that, with more scale and data, they can acquire a magical capabilities to follow our instructions, performing tasks that once seemed like only intelligent, creative humans could do in hours, or days, are getting done in matter of seconds.
LLMs can be divided into two types: [1] vanilla and [2] instruct-led. Vanilla LLMs are trained to predict the next word in a sequence, while instruct-led LLMs are trained additionally to follow instructions in the user prompt.
Instruct-led LLMs are enhanced by a technique called “Reinforcement Learning In Human Feedback (RLHF)”, using additional data to train the LLMs to take the input texts as instructions. That’s why we started calling these input texts as “prompts” and the act of changing the behavior of LLMs as “prompt-ing”.
When choosing an LLM, it is important to consider whether it is instruct-led trained or not.
To learn more about the details of how LLMs are trained, I recommend watching this video by Andrej Karpathy.
LLMs as Engines
LLMs, or large language models, are not products themselves, but rather engines that can be used to create products. They are similar to car engines in that they provide the power to move something, but they need other components to be useful. For example, a car engine needs wheels, a body, and a steering wheel to be a transportation product.
A LLM itself is just another ML model that takes an input and generates an output.
In the same way, LLMs need other components to be useful. For example, the input prompt may need to be configured in a certain way, or the output may need some post-processing. And they also need to be integrated into a user interface so that people can interact with them. How to use these engines is really up to the designers and developers.
Some companies create their own engines and also the products that use them. For example, Google and OpenAI train their own LLMs and build products like Bard and ChatGPT. Other companies only provide the engines (such as Facebook’s Llama), as Rolls-Royce manufactures airplane engines but not the aircrafts. Maybe some others will provide custom-tuned LLM engines that have been fine-tuned on a specific dataset. For example, Bloomberg developed an LLM for finance.
However, most companies will not create their own LLMs from scratch. Instead, they will utilize engines that have been trained by other companies.
This is because creating an LLM from scratch is very expensive. The majority of the cost comes from the pre-training stage, which requires requires running 1000s of GPUs, months of training, and millions of dollars. Only companies and open-source projects with abundant resources can afford to do this.
Pretraining is the most compute intensive stage (Screenshot from State of GPT talk)
Footprint generated by training Llama 2 model. Average person in EU emits 8.4 tCO2eq a year.
Using LLMs
Currently, there are two ways to use these engines:
Calling an API: This is the simplest way to use LLMs. You can simply write a few lines of code to call an API and get the results you need. This is a great option for small projects or for people who don’t want to deal with the hassle of hosting their own.
Hosting your own: If you need more control over the LLM or if you need to use it for a large project, you can host your own. This requires more technical expertise, but it gives you more flexibility and control.
Calling an API
There are a number of companies that offer LLM APIs, such as Google, OpenAI, and Hugging Face. These APIs make it easy to use LLMs for a variety of tasks. The first method is so easy that it basically remove any barrier to entry to access LLMs. Anyone would can write simple API client code can leverage the power of these capable NLP models. All the complexity over infra, latency, training are done by the API providers. Client just needs to pay for the call.
Such simplicity fundamentally changed the game of NLP. The space exploded with people experimenting with LLMs via APIs. Some say that this is a larger phenomenon than the advent of iPhone’s App Store, which resulted explosions of third-part mobile apps. Now a high school student with minimal coding knowledge can create a pretty reasonably performing chatbot, threatening the existence of NLP engineers like myself.
However, there are a few drawbacks to calling an API.
Latency: Calling an API can add latency to your application. This is because the API provider needs to process your request and then return the results.
Cost: You may need to pay for each API call. This can add up if you are using the API for a large project.
Data privacy: The data that you pass to the API may not be private. This is because the API provider needs to store the data in order to process your request (and sometimes subject to human review for their quality improvements).
Limited control: input length and parameter selection are bounded by whatever API provides.
Hosting your own
As an alternative, Hosting your own LLM gives you more control over the model and the data. You can also use the model for more demanding tasks that would be too slow or expensive to call an API for. However, there are also some challenges to hosting your own LLM.
There has been a surge of different open-sourced LLMs of varying sizes. Meta’s Llama 2 is a notable example, as it is now commercially usable, unlike its leaked predecessor. For example, StableLM and MPT are two other players who have released LLMs that they claim are as good as or even better than those of the major players.
Pros:
Data privacy: Input and output do not need to go through a third-party. Such data ownership may be an important factor for a project.
Costs: may be lower than calling an API for large number of data volumes.
Cons:
Infra expertise: you need to create and maintain the infra to host the LLMs. Cloud providers will make this part easier, but still requires technical expertise.
Performance: LLMs tend to have better capabilities when bigger, and larger models mean more difficulties to host.
Moreover, your own LLMs can be further tuned with proprietary data. However, fine-tuning (supervised learning or RLHF) is not an easy task to do right, so there will be more companies that offer finetuning-as-a-service, which you just upload your data, select the base LLM model, finetune it, and download the model weights.
Today I laid out my thoughts on the high-level view of LLM engineering. Main takeaway is that: “LLM is an engine, not a product”. I hope this view can inspire many product designers and engineers to think of LLMs to revolutionize whatever you are working on:
Boost your existing (or new) product with an LLM engine!
Next question that might arise then is, “what does a software engineer working with LLMs do?” Spoiler alert - it’s not jut prompt engineering. I’ll cover that in my next post.
In Summer 2016, I decided to start a new life as a postgraduate research student in the area of natural language processing (NLP) and machine learning. My primary goal was to become a person who can understand the field (able to read and understand academic papers to a certain degree) and can produce some valid research work (able to fully follow and complete a research cycle). After I had finally finished defending my Master of Philosophy (MPhil) and receive a pass, I feel confident enough to say that I know much more about the field than most people do.
I can imagine myself in academia, continuing being a Ph.D. candidate, becoming a research intern in one of those larger corporate labs in the US or China, and publishing more papers at conferences. These are the glorious moments. It gives a good comparatively advantage for chasing fame and fancy names on my career. I might even feel like solving bigger problems in the field that everyone wants to contribute.
Inevitable downsides are, of course, advisor-student conflicts, low salary, social detachment, and many other mental consequences caused from prior reasons that most Ph.D. candidates suffer. However, in this post, I do not want to talk about these common topics.
I choose to go back into the startup scene. This choice surprises many people, especially those I met in academia.
I want to draft out my thoughts on why I am choosing this path. Hopefully, it might be a useful reference for those people who have similar considerations.
Potential vs. Practicality
“Don’t you want to solve cool problems?” — from the pro-Academia side
The good thing about academia is that people get excited quite easily. They tend to value potential a lot. That is the primary purpose of research — aiming for works that can potentially be the next big thing and make our lives better. Nevertheless, many “cool” research gets discarded from the table. Many “cool” research may not be suitable for practical use, due to many reasons like operational cost, computational inefficiency, or low adaptability. Very few jobs evolve to be legendary, transforming the whole field or industry. I am not saying there is something wrong with this. I believe that such philosophy of research advances technology and engineering.
Nevertheless, my question was: “Do I appreciate doing that myself?”
The answer was NO. For two years of doing research, I realized that I am a person who loves practicality and simplicity. Although potential is valuable, the problems that some research are tackling make me walk away immediately because for me they sounded like: “we will start registering passengers to go back and forth Mars.” (true story). Maybe it will happen one day, and those people working on that problem for decades will be the ones that make it happen. Note that neural networks back in the days were like that.
But the thing is, I do not value too much of “moon-shot” (or, rather, mars-shot) thinking (This is kinda self-contradictory since I decided to come into academia after listening to a conference by Singularity University talking about “moon-shot” thinking).
Motivating Factors
“What is the main motivating factors of life?” — myself
My postgraduate student life was all about academic conferences — paper submission deadlines, reviewer comments, acceptance notifications, preparations of invited talks, and attending the conferences. I guess this may be different between labs and advisors, but at least my experience in our lab was very keen on these results, especially for graduation.
To be honest, I hated this cycle. I felt like I was a racehorse chasing for deadlines after deadlines.
Moreover, so much of my performance depended on reviewer comments and the acceptance of the paper. However, I think these reviewing systems in machine learning or artificial intelligence conferences are broken and inconsistent, due to the exponential growth of the number of submissions. I felt like in this kind of system I do not have control over the evaluation of my work(I know a lot of people in academia are trying to fix this problem and sincerely support such movements).
Glory is big, but the cycle is stressful. It is not the life I want to live.
Best of both worlds?
A recent couple of months were a time of struggle to find the best of both worlds. I did not want to abandon what I invested for two years and go back as a software developer. I knocked on many companies by talking to HRs and researchers at conferences and landing an interview with some of them. I found out that many research teams in large IT companies do not work with product teams and focus on publishing in conferences. Also, I confronted other complicated issues such as working visa, but I will talk about that in another post.
I shortly experienced working at a Korean startup that built a social media based on a movie recommendation engine. Now that company has evolved to offer movie streaming service based on that engine, having a hard fight with Netflix. One thing I remember is that a startup company has a lot of problems to solve.
Nevertheless, people love their product (I am still one of their active users). They feel that their machine learning algorithms create some value for their lives and continue spending time and money on the product. When we were building the product, we were sometimes happy, sometimes sad after reading user’s product reviews or seeing the fluctuations of the service usage. Everything was imperfect, and the company was at war every day, but I felt more motivated trying to find problems that I can solve.
Now I have the knowledge and skills to build that kind of machine learning engine to run a consumer service (at least I can try).
I made my choice to work at Oyalabs, a baby-tech startup that builds a smart baby monitor that can help parents to boost their baby’s speech development.
We are making a device that can analyze what and how much the parents are speaking to the baby and give feedback about improving their parenting. The importance of parent-baby interactions is supported by not only neuroscience and education research but also parenting cultures, especially in Asia (I bet Koreans who are reading this know the importance of 태교(胎敎)).
As a machine learning engineer (NLP), I will be developing the machine learning engines for our product! I am aware that it is a very very very difficult task, but I am excited!
I finally decided to what to do after graduation — research in the area of artificial intelligence. It was indeed a surprising decision — even for myself — to jump into research since my previous focus and interest were more on the application side of computer science.
I came up with this “pivoting” less than a month ago, but I was lucky enough to be offered a position at the Human Language Technology Center in HKUST from two professors with scholarship, fully covering my tuition and expenses for two years.
I will be working with Professor Pascale Fung and her research team, primarily focusing on understanding (and making computers understand) human language and emotion. I hope I can add great value to the team.
Among several factors that affected my judgment, the urge to improve my problem-solving ability was the most important. After a thorough review of my past self, I came to a conclusion that the reason for my recent failures in trying to “start something up” was my lack of deep understanding of the core problem. Although 3.5 years of undergraduate education effectively weakened such ability, I fortunately realized my shortcoming after working on different things outside of school.
Deciding to stay at school is counterintuitive in that sense, but I figured doing research is a totally distinctive experience from undergraduate education and will be a valuable experience to focus and deepen my thinking.