About Me

[UPDATE 💡] I moved to Mountain View, California in December 24, concluding my chapter in Tokyo! Also, I joined Google Labs to make a new AI product for Google on July 25, leaving Gemini App team.

Hi, I'm Jiho. I am a product developer and researcher, building products with machine learning and natural language processing (NLP).

I like to write about machine learning, NLP, AI and product development. I also write in Korean.

Currently, I am working on a new AI product that aims to innovate how people work.

I've been at Google since 2020, starting from Japan, working on these products:

  1. labs.google,
  2. gemini.google.com,
  3. Assistant (OK Google)

I am interested in various areas of AI/ML/NLP product building:

Before Google, I was doing NLP research at HKUST.

I worked in multiple startups in Korea, HK as an intern, founding engineer, employee, etc.

Since 2023, I participate in Google for Startup Accelerator (GFSA) Korea program as a tech mentor.

I grew up in Korea, but spent most of my 20s in Hong Kong, and then 5 years in Japan! I picked up different culture and languages along the way!

Feel free to reach out via email (me at jiho-ml.com)!

Below are my blog posts:

Posts

My 1st Google Labs Product Launch - Pomelli

Launching a product is exciting, until the server goes down after 3 hours. Users were complaining on Twitter and Discord that they cannot get in.

I joined Google Labs this July and was fortunate enough to get a chance to experience a product launch only after 3 months! Let me share what happened and some thoughts.

What is Pomelli

Pomelli is an AI marketing tool for small and medium sized businesses. It understands each business and brand and helps one to easily create marketing campaigns and creatives, tailed to each business. Read more in the Google Keyword Blog.

bdna Generate the business DNA

campaign Create your marketing campaigns

creatives Generate in-brand creatives.

This was a beta launch, now only available in USA, Canada, Australia, and New Zealand. Hopefully, we can launch in more countries next year!

Launch Day

The morning we launch, the team was thrilled. We had a live dashboard to see real traffic coming in. It was supposed to be a relatively a silent launch, but we got a lot of attention in X (Twitter) and other channels.

But then the things were not working after 3 hours. Our servers were not responding properly. Users started complaining that they cannot go to the next stage of the product.

Apparently, things were melting on the component that I worked on. We had to analyze what’s going on and figured some part of the code was problematic. I had to do an emergency bug fix to cut that part of the code out, without too much sacrifice of the original feature.

I deployed the code, hoping that things would work again. However, it didn’t. By this time, users were blocked for several hours. We were even considering to put the curtains down, saying that we will be back later.

But then my amazing colleague detected another, potentially relevant, issue in the server. He started working on a fix. Things came back to normal. We announced to the users that we experienced a distruption due to unexpected amount of traffic, but we fixed the issue.

It was almost midnight when things concluded. We could all go to bed that day.

Takeaways

Good problem to have

Our PM director, Jaclyn, wrote in her blog, reflecting the launch.

She called this the Sucess Disaster. Because we were successful to get attention and traffic, problems revealed.

“This is the kind of problem you want to have. The reason things went wrong was because something went incredibly right.”

This view blew me away, not only because it is such a positive perspective, but also because it is such an accurate take. We would have never caught this issue with only hundreds of users, but only by having thousands or even more users using the product.

In the end, we are having unbelieveable amounts of users coming in and trying out our product!

Early Experiment

The philosophy of Google Labs is to launch early and validate fast. In a huge company like Google, it may be hard to have this kind of mentality and environment. Google Labs is literally a product labatory to avoid falling into such trap.

Just go take a look at https://labs.google and see what we are doing. There are many things that are hard to imagine a large company like Google would work on.

After launching Pomelli, I am now starting to understand and experience what this philosophy actually means.

The bigger picture

That does not mean that we are aiming for small things. We are launching fast, but these are just seeds of something that is potentially much bigger.

For example, Pomelli is not just another content generation tool, but it may become an ultimate operating platform for small and medium sized businesses, which Google has traditionally not been great at.

Forbes already analyzed the potential of our product - https://www.forbes.com/sites/solrashidi/2025/10/31/will-googles-ai-bet-pay-off/

For this reason, we need to think big and look multiple steps ahead, but need to start somewhere.

This launch is one of the highlights of my whole programming career. I’m very happy that I decided to join this org and team and launch something with the team so fast.

I am learning and experiencing a lot of interesting product and engineering problems, so I’m looking to write more on those things.

Some useful principles for using AI coding tools

Quadrants

Mindset

Quadrants

Think about where you are at this 4-quadrants.

Quardrant 1 - High Impact Zone. Quardrant 2 - High Danger Zone. Quardrant 3 - Full Fun Vibe Coding Zone. Quardrant 4 - “Parents at the party” Vibe Coding Zone.

  1. Potentially I can make the most impact with LLM, but also I need to be most cautious. Don’t forget the code here is going to be run on production, so I need to be rigorous. I need to think about basic unit and integration testing.

Being productive here can lead to being good at my main job. Be thoughtful.

  1. I would not operate here too frequently, unless I have a patient colleague who can help me review the code. But don’t throw them a large chunk of garbage code without any understanding. This risks of making that person mad and lose trust on me.

I should try to get my proficency at least at medium level, to Quadrant 1 as close as possible.

  1. This is where I have no idea what’s going on (flying blind), but get to have fun to see the outputs. At least, I need to make sure what you want as output (ex. UI, data visualization). I will probably waste a lot of time prompting, because I am not really understanding the code. Just ride the vibe.

  2. Here is also quite productive, but not it’s not totally vibing. I call it the parents at the party, because I might still be parenting the code that’s been generated, which is not exactly vibe coding. It’s like being a parent who still want to enjoy the party vibe while discplining their children.

Intern

Treat AI coding tool as your intern, not an expert.

  • Sometimes, I need to micromanage. Be specific about which files and past changelists to read to do the task.
  • Give them tasks that are tedious, straightforward, and stupid, but and need to be done.
  • Break up the tasks into smaller chunks.
  • Don’t trust their codes. Read their code with suscipicious eyes. Read the rationale of their changes, too.
  • Accept that they will sometimes be stuck in a rabbit hole.
  • Accept that they will sometimes be brilliant than me and produce creative solutions.

Use Git effectively + small change lists.

Make sure to separate into small chunks of changelists and commit what you already reviewed and that’s good. This will help organize what’s going on and roll back if things go badly.

Also, Some IDEs will first show the proposed changes that I need to accept in bite pieces. Please use this feature.

Check your energy level

Sometimes, when things don’t work, I tend to get into a loop where I just keep generating and prompting the AI coding tool to fix the stupid error message I don’t understand. If it fails, I just regenerate or just reword the prompt slightly, and re-try.

LLMs get stuck in a rabbit hole, especially when I am stuck in the rabbit hole.

When my energy level is low, it’s like I want to do doom-scroll. Don’t doom LLM code. Take a break. Go get some tea, walk, exercise. Or just do it tomorrow.

And when your energy level is back, I try to actually read the code / error message myself without the AI coding tool. Sometimes, the answer is just right there.

Don’t tell your colleagues that you are using AI agents. Or do.

I still don’t know if I should be embarrassed or proud when my manager or colleagues sees my triple monitors having two AI agents to write code. Once my manager said, “that’s cool”. I don’t know if they meant it or started degrading my confidence in my code :P

Anyways, I own your code and the results from it. People wil care less and less about how I got there as long as it is good.

If I hire your own intern, I will be responsible for their outcome, but stakes may be low. In the LLM’s case, the intern’s work is coming out with your own name. Take pride in my work. Do you review. Don’t take transfer this responsibility soley to my code reviewers.

Final verdict

I am 100% sure that using LLM coding tools has become a crucial aspect for being a productive engineer. However, in order to do this, you first need to be a good, principled engineer, or you are going to produce sloppy work, which is a huge risk to not only your career but also your team’s product.

My approach of AI coding tools

Google IO 2023 was one of the most memorable event, because our CEO Sundar presented a piece of code that I reviewed.

Google IO 2023 code demo

My code being presented at Google IO 2023.

PaLM, the state-of-the-art language model at that time, was featured to not only fix your bug, but also put code comments in Korean!

A side story for this 15 sec presentation is that I was pulled into a group chat of several Korean Googlers to come up with the best example of LLM fixing a bug. I added this DFS example, because this was something I frequently get wrong.

I was delighted that PaLM can fix this, but was even more happy when Sundar presented this code live to the world!

(I joked that this would probably the most influential code snippet that I would ever write or review in my Google career..)

Anyways, this is just over 2 years ago. If I think about how I use coding agents in 2025, the advancement is mind-blowingly fast. People say that software engineering as a career is in track for deprecation. If vibe coding can build softwares, then why do we need expensive engineers?

Before we get to that question, I want to share how I use AI coding tools. I know that there might be a lot of folks who use AI tools better than me, but I want to record my current experience for my own future entertainment.

Google’s Developer Ecosystem / Culture

Google definitely has a unique developer ecosystem and culture, so unique that a software engineer working for Google has a risk of being stuck in it.

(If you are interested, you can read Software Engineering at Google)

But in terms of AI coding assistant, I would say outside world has moved much faster. Github Copilot, Cursor, Windsurf, Claude Code, and so many applications have came up over the last 1-2 years.

Since Google has a lot of software engineers, there is huge incentive to make our tools better. Also, we have Gemini, especially 2.5 showing huge improvements in coding capabilities.

By now, our internal AI coding tools have catched up a lot, and engineers adopted fast. We also have released them to the public, so I can introduce some workflows that I use frequently:

  1. gemini-cli - a terminal based coding assistant. Similar to Claude Code.
  2. Gemini Code Assist - we don’t use exactly this, but similar features are integrated to our internal IDE, Cider and code review tool, Critique.

Most Productive Workflows (I find)

1. Code Understanding

When I joined the new team in July, I had to onboard to a new codebase as soon as possible. But the challenge was that I have never written a Java backend nor Angular.js frontend code! Since I mostly worked on data, I was more familiar with C++ or Python.

Obviously, my team mates are the most helpful for onboarding, but I did not want to bother them for every details of the code. So I decided to leverage Gemini CLI.

“(Hey Gemini) Explain the workflow of this function. Draw a diagram.”

“Which file does this link to? How is it wired up this way?”

It is like having a mentor with unlimited patience and bandwidth right next to you!

I drew how some parts of the code worked on a whiteboard with the help of Gemini to get the whole picture and the details. This really boosted my onboarding. I even could give a presentation to my colleagues on what the status quo of some component is and suggest improvements.

2. Refactoring

Refactoring is often neglected, because it is tedious structural changes of the code - Moving function parameters around, breaking down functions, renaming things - while not breaking the compiler or the original feature.

Good news is that LLMs never get bored and is amazing at syntax. Also, it can go through the whole code or even codebase and make changes that are needed if I change one little thing. Tedious edits like imports, header files, etc.

A new feature development is some times blocked by refactoring. Now, small refactorings would not block me, because the time I spend on it reduced to 1/10.

3. Unit Test Generation

Important point to make your codebase easier to refactor is having good unit tests. Any basic unit test is significantly better than nothing.

However, some times unit tests are omitted due to the tedious syntax of mocking and test case writing. Again, hard-working LLMs are here to help!

Obviously, I do not ask it to write unit test without knowing what I want to cover, but it helps a lot when I don’t have different mocking library patterns on top of my head.

4. Writing new code

With LLMs, my way of thinking elevated to focus on the logic, rather than specific langauge’s syntax.

I have my basics on OOP and async operations in my head, but just not in Java, which I haven’t used in more than 15 years. It would have taken me 2-5x amount of time if I had to look through Java documentations to find the equivalent of something in C++.

Now I will prompt the LLM to implement the logic I want. Sometimes, I even express myself in Python with a simpler syntax and ask it to implement the same logic in Java.

Another good thing is that the coding assistant will take the whole file as a context, so it follows the team’s previous patterns and style, whenever reasonable.

I also find this super useful for frontend development. I recently picked up Angular.js, which sometimes involve modifying muliple files (javascript, css, html) at once. LLM is excellent for doing this.

5. Self-code review

Due to my personality, I tend to spend quite a lot of time to review my own code before sending it out to a colleague. Now, there is a feature to auto review a changelist. This has been great to reduce both my and the reviewer’s time, especially I am prone to make basic mistake on a language that I’m not familiar with.

6. Error Message Parsing

Error messages can be brutally long and cryptic, but I now ask LLM to explain them to me and even suggest a fix! This really is a game changer to reduce time debugging.

7. Readability Suggestion

Google has a particular culture of readability reviews. Pros of this culture is that the overall code quality is higher, with the cost of slowing down. Especially for new product development, velocity is super important, so I felt relunctant to put any readability related comment on the code.

However, now LLM can read my nitty comment and attach a fix with it. The author can select to apply the fix with one button press. This is really effective to keep the code quality high, while maintaining velocity!

8. AI Powered Colab

Colab (Jupyter notebook) is a powerful tool for quick data analysis or data pipeline prototyping. Most colabs do not need to be elegent or well-tested as production code.

This means that there is more room for “vibe” coding. Colab recently started adding AI coding assistant inside the UI!

9. Vibe Coding Prototypes

Most of the time, demoing is better than explaining in words. Vibe coding prototypes when brainstorming new ideas on a feature is really powerful for this reason.

I can imagine or already experienced how vibe coding can be so effective for non-technical roles like product managers, business devs, and UI/UX designers to build ideas without consuming engineering team’s resource.

Vibe coding is at the level to replace engineering teams, but I would not say too early given the pace of speed this space are developing.

I often use Gemini Canvas from Gemini App for this purpose, but there are about thousands of tools to vibe code something.

Closing

There has been studies that using AI tools actually make developers slower., but people are always talking about how crazy productive these AI coding tools are. Who is correct?

While I am not the most expert in this field, I can totally see that when not used properly these tools can slow you down. I also experienced some trial and errors.

There are some principles and tips I have on how to effectively use these tools, but that will be for my next post.

Moving to the USA

One of the books that deeply influenced me is Antifragile by Nassim Taleb.

Taleb talks a lot about optionality. My interpretation is that optionality in life means putting myself in situations where I have unlimited upside and limited downside.

Limiting my downside has been relatively easy: follow obvious rules like “Don’t put all your eggs in one basket,” “Avoid life-threatening activities,” and “Don’t take leveraged bets that can knock you out of the game.” In short, don’t bite off more than you can chew.

On the other hand, creating unlimited upside has been about thinking strategically and making hard choices. Whenever I feel like my upside has limits, I seek change.

For example, when I was a computer science undergrad, I got involved in many different startups. I learned many skills, but most importantly, I learned that merely being good at programming had a limit. Fortunately, my best friend, whom I met playing football, introduced me to a natural language processing (NLP) lab. I never liked university, but I decided to stay and do research as a master’s student. During these two years, I dove deep into machine learning, deep learning, and NLP. I took baby steps to learn about the field and do research, eventually publishing my work at NLP conferences.

The timing could not have been better. I remember seeing AlphaGo defeat Lee Sedol in my first year of graduate school. Those two years shattered the ceiling that had limited me as a computer programmer.

After postgrad in 2019, I joined an early-stage startup in Hong Kong. After a year, it was evident that things were not going well. Also, the city I’d lived in for seven years was in political turmoil. I needed a change again. I started looking into leaving Hong Kong and joining a bigger company. That led me to interview at the biggest company I could possibly join—Google. My first position was as a Computational Linguist. I never passed the initial screening for software engineer roles, probably because I was just one of thousands of CS graduates. My two years in NLP research made it possible to get a foot in the door of big tech.

One challenge was that the job was in Japan. I had never thought about living in Japan and couldn’t even read hiragana. The job didn’t require Japanese, so I took a leap of faith and moved.

This difficult choice led to the most interesting five years of my life, despite the COVID-19 pandemic. We became a family there—going in as one and coming out as three. I can now converse in Japanese (somewhat). Five years in Tokyo still feels like a long honeymoon.

Despite the great food and lifestyle of Tokyo, I once again had to seek change for the same reason—limited upside. I felt I would need to dive deep into Japanese society to find more opportunities, or else remain confined to one company.

An opportunity to immigrate to California came, so we took it as a family. Now we are in Silicon Valley. As an engineer working on NLP and AI in 2025, I figured this is the best way to put myself in the right place and environment.

Another important factor that influenced our choice was “belonging.” As an international family with a multicultural child, this is an important issue. Unfortunately, our hometowns (Hong Kong and Seoul) and our last home (Tokyo) were suboptimal. After several months in California, we feel we are in the right environment.

Every choice comes with trade-offs; none is perfect. We miss the relatively high-quality, comfortable, affordable, and train-centric life in Tokyo. Our extended families are farther away now, and we need to build our social circles again. However, I believe that hard choices that create unlimited upside eventually make our lives easier in the long run. Let’s see how life turns out.

Large Language Models are Engines

Large Language Models (LLMs) have taken over the world in recent months. New technological advances are happening really fast.

New models are being released every week, and it’s hard to keep up. To be honest, I stopped following the research after a few “groundbreaking” models, but I have been still thinking about the implications of LLMs for software engineering.

I had the chance to work on one of the biggest LLM projects in the world, which gave me some initial insights. I’ve also been reading and experimenting with different engineering practices for LLMs.

I’ll share my thoughts on engineering LLMs. This field is changing rapidly, so I’ll be sure to emphasize that these are just my initial thoughts.

Disclaimer: The thoughts in this post are my own and do not represent my employer. I am not sharing any information that is not publicly known.

What are LLMs?

Let’s go back to the basics.

LLMs, or large language models, can predict the next word in a sequence. They are trained on massive datasets of text and code, and can learn to perform a variety of tasks, such as generating text, translating languages, and answering questions. That’s why some people call them, “fancy autocomplete”. But recent models proved that, with more scale and data, they can acquire a magical capabilities to follow our instructions, performing tasks that once seemed like only intelligent, creative humans could do in hours, or days, are getting done in matter of seconds.

LLMs can be divided into two types: [1] vanilla and [2] instruct-led. Vanilla LLMs are trained to predict the next word in a sequence, while instruct-led LLMs are trained additionally to follow instructions in the user prompt. Instruct-led LLMs are enhanced by a technique called “Reinforcement Learning In Human Feedback (RLHF)”, using additional data to train the LLMs to take the input texts as instructions. That’s why we started calling these input texts as “prompts” and the act of changing the behavior of LLMs as “prompt-ing”.

When choosing an LLM, it is important to consider whether it is instruct-led trained or not.

To learn more about the details of how LLMs are trained, I recommend watching this video by Andrej Karpathy.

LLMs as Engines

LLMs, or large language models, are not products themselves, but rather engines that can be used to create products. They are similar to car engines in that they provide the power to move something, but they need other components to be useful. For example, a car engine needs wheels, a body, and a steering wheel to be a transportation product.

LLM as a black box engine

A LLM itself is just another ML model that takes an input and generates an output.

In the same way, LLMs need other components to be useful. For example, the input prompt may need to be configured in a certain way, or the output may need some post-processing. And they also need to be integrated into a user interface so that people can interact with them. How to use these engines is really up to the designers and developers.

llm car

Some companies create their own engines and also the products that use them. For example, Google and OpenAI train their own LLMs and build products like Bard and ChatGPT. Other companies only provide the engines (such as Facebook’s Llama), as Rolls-Royce manufactures airplane engines but not the aircrafts. Maybe some others will provide custom-tuned LLM engines that have been fine-tuned on a specific dataset. For example, Bloomberg developed an LLM for finance.

However, most companies will not create their own LLMs from scratch. Instead, they will utilize engines that have been trained by other companies.

This is because creating an LLM from scratch is very expensive. The majority of the cost comes from the pre-training stage, which requires requires running 1000s of GPUs, months of training, and millions of dollars. Only companies and open-source projects with abundant resources can afford to do this.

pretraining table

Pretraining is the most compute intensive stage (Screenshot from State of GPT talk)

pretraining table

Footprint generated by training Llama 2 model. Average person in EU emits 8.4 tCO2eq a year.

Using LLMs

Currently, there are two ways to use these engines:

  1. Calling an API: This is the simplest way to use LLMs. You can simply write a few lines of code to call an API and get the results you need. This is a great option for small projects or for people who don’t want to deal with the hassle of hosting their own.

  2. Hosting your own: If you need more control over the LLM or if you need to use it for a large project, you can host your own. This requires more technical expertise, but it gives you more flexibility and control.

Calling an API

There are a number of companies that offer LLM APIs, such as Google, OpenAI, and Hugging Face. These APIs make it easy to use LLMs for a variety of tasks. The first method is so easy that it basically remove any barrier to entry to access LLMs. Anyone would can write simple API client code can leverage the power of these capable NLP models. All the complexity over infra, latency, training are done by the API providers. Client just needs to pay for the call.

Such simplicity fundamentally changed the game of NLP. The space exploded with people experimenting with LLMs via APIs. Some say that this is a larger phenomenon than the advent of iPhone’s App Store, which resulted explosions of third-part mobile apps. Now a high school student with minimal coding knowledge can create a pretty reasonably performing chatbot, threatening the existence of NLP engineers like myself.

However, there are a few drawbacks to calling an API.

  • Latency: Calling an API can add latency to your application. This is because the API provider needs to process your request and then return the results.

  • Cost: You may need to pay for each API call. This can add up if you are using the API for a large project.

  • Data privacy: The data that you pass to the API may not be private. This is because the API provider needs to store the data in order to process your request (and sometimes subject to human review for their quality improvements).

  • Limited control: input length and parameter selection are bounded by whatever API provides.

Hosting your own

As an alternative, Hosting your own LLM gives you more control over the model and the data. You can also use the model for more demanding tasks that would be too slow or expensive to call an API for. However, there are also some challenges to hosting your own LLM.

There has been a surge of different open-sourced LLMs of varying sizes. Meta’s Llama 2 is a notable example, as it is now commercially usable, unlike its leaked predecessor. For example, StableLM and MPT are two other players who have released LLMs that they claim are as good as or even better than those of the major players.

Pros:

  • Data privacy: Input and output do not need to go through a third-party. Such data ownership may be an important factor for a project.
  • Costs: may be lower than calling an API for large number of data volumes.

Cons:

  • Infra expertise: you need to create and maintain the infra to host the LLMs. Cloud providers will make this part easier, but still requires technical expertise.
  • Performance: LLMs tend to have better capabilities when bigger, and larger models mean more difficulties to host.

Moreover, your own LLMs can be further tuned with proprietary data. However, fine-tuning (supervised learning or RLHF) is not an easy task to do right, so there will be more companies that offer finetuning-as-a-service, which you just upload your data, select the base LLM model, finetune it, and download the model weights.

Today I laid out my thoughts on the high-level view of LLM engineering. Main takeaway is that: “LLM is an engine, not a product”. I hope this view can inspire many product designers and engineers to think of LLMs to revolutionize whatever you are working on:

llm existing product

Boost your existing (or new) product with an LLM engine!

Next question that might arise then is, “what does a software engineer working with LLMs do?” Spoiler alert - it’s not jut prompt engineering. I’ll cover that in my next post.