Gemini (Google AI 모델)

구글이 뒤늦게 AI 챗봇 전쟁에 뛰어들었다 — 그러나 "뒤늦게"라는 표현이 어울리지 않는 회사가 있다면 그게 바로 구글이다. 검색 엔진, 유튜브, 지메일, 구글 독스, 안드로이드... 전 세계 인터넷 인프라를 쥐고 있는 회사가 AI를 무기로 삼으면 어떤 일이 벌어질까. Gemini(제미나이)는 그 답을 보여주고 있다.

1. 탄생 배경: Bard의 실패와 재건

구글은 ChatGPT가 등장하기 훨씬 전부터 언어모델 기술을 선도했다. 2017년 트랜스포머 아키텍처를 발표한 곳이 구글이고, BERT, T5, PaLM 등 주요 언어모델 논문도 구글 연구진이 주도했다. 즉, GPT의 기반이 된 기술 자체가 구글에서 나온 셈이다.

그런 구글이 ChatGPT 충격을 받은 건 아이러니다. 2022년 말 ChatGPT 등장에 구글 내부에서 "코드 레드(Code Red)"가 선언됐다는 보도가 나왔다. 급히 출시한 챗봇이 Bard(바드)였다.

그런데 Bard의 첫 공개 시연에서 치명적 실수가 있었다. 제임스 웹 우주망원경 관련 질문에 Bard가 오답을 자신 있게 내놓았고, 이를 홍보 영상에 그대로 담았다. 이 장면이 퍼지면서 구글 주가가 하루에 9% 하락했다. AI 신뢰성 위기의 상징적 사건으로 남았다.

2023년 12월, 구글은 Bard를 사실상 은퇴시키고 완전히 새로운 모델 'Gemini'를 발표했다. Bard는 2024년 2월부터 Gemini로 리브랜딩됐다.

2. 멀티모달의 왕: 텍스트+이미지+오디오+영상

Gemini의 핵심 차별화 포인트는 처음부터 멀티모달(multimodal)로 설계됐다는 것이다. GPT-4V나 Claude의 비전 기능이 이후에 추가된 것과 달리, Gemini는 텍스트, 이미지, 오디오, 영상을 동시에 처리할 수 있도록 기초 설계 단계에서부터 통합됐다.

2024년 공개된 시연 영상에서 Gemini는 스케치를 보고 게임을 실시간으로 진행하고, 사용자가 말하는 내용을 이해하며 즉각 반응했다. 물론 이 시연이 일부 편집됐다는 논란이 있었지만, 방향성 자체는 분명했다.

실제로 구글 I/O 2024에서는 Gemini Nano가 갤럭시 S24에 탑재되어 실시간 통화 내용을 분석하고 사기 전화를 감지하는 기능이 시연됐다.

3. 버전 구성: Ultra, Pro, Flash, Nano

Gemini Ultra: 최고 성능 모델. 출시 당시 GPT-4를 능가한다고 발표했으나, 실제 사용자 경험에서는 엇갈린 평가가 있었다.

Gemini Pro: 중간 성능. 구글 서비스(지메일, 독스, 검색)에 통합되어 일반 사용자가 가장 자주 만나는 버전.

Gemini Flash: 속도와 비용 효율 최적화 버전. API 개발자들에게 인기.

Gemini Nano: 기기 내(on-device) 실행 가능한 소형 모델. 인터넷 연결 없이 스마트폰에서 직접 동작.

4. 구글 생태계와의 통합: 최대 강점

Gemini의 실질적 강점은 구글 생태계다. 지메일에서 Gemini를 쓰면 이메일을 요약하거나 답장을 작성해준다. 구글 독스에서는 문서 전체를 분석하고 편집 제안을 한다. 구글 검색에는 'AI 오버뷰' 기능이 통합됐다.

특히 구글 원 드라이브와 연동하면 수년치 문서를 컨텍스트로 활용할 수 있다. Claude나 GPT가 외부 문서를 업로드해야 하는 것과 달리, 구글은 이미 사용자 데이터를 보유하고 있다는 이점이 있다. 이것이 구글의 "AI 해자(Moat)"다.

5. Gemini 1.5: 백만 토큰 컨텍스트

2024년 2월 공개된 Gemini 1.5 Pro는 100만 토큰 컨텍스트라는 경이로운 기록을 세웠다. 이는 책 약 750권 분량을 한 번에 처리할 수 있는 수준이다. Claude 2의 100k 토큰이 화제였던 것에 비하면 10배의 도약이다.

실제 시연에서 구글은 1시간짜리 영상 전체를 분석하고, 특정 장면을 찾아내는 데모를 선보였다. 영화 전체를 분석해서 스토리라인을 추출하거나, 대용량 코드베이스를 통째로 처리하는 것도 가능해졌다.

6. 논란과 비판

이미지 생성 편향 논란 (2024년 2월): Gemini의 이미지 생성 기능(Imagen)이 역사적 인물을 다양한 인종으로 묘사하면서 큰 논란이 됐다. 나치 독일군을 흑인으로, 미국 건국의 아버지를 아시아계로 생성한 이미지가 소셜미디어에 확산됐다. 구글은 이미지 생성 기능을 일시 중단하고 수정에 나섰다. "지나친 다양성 주입"이라는 비판과 "역사적 사실을 왜곡했다"는 지적이 함께 나왔다.

Bard 시연 실수 이후의 신뢰 문제: 첫 시연에서의 오답 사건 이후, 구글 AI에 대한 신뢰 회복이 숙제로 남아있다.

구글 검색의 AI 오버뷰 오류: 2024년 5월, 구글 검색의 AI 오버뷰 기능이 황당한 오답을 내놓는 사례들이 속출했다. "피자 치즈가 떨어지지 않게 하려면 풀을 뿌리라"는 황당한 답변이 대표적이었다. 구글은 서둘러 수정했다.

7. 기업 전략: AI-First Google

순다르 피차이(Sundar Pichai) 구글 CEO는 "구글은 AI-First 회사"라고 선언했다. Gemini는 그 중심에 있다. 구글의 모든 제품에 Gemini를 통합하겠다는 전략이다.

경쟁자들이 구글의 잠재력을 두려워하는 이유는 명확하다: 구글은 검색 트래픽에서 쌓아온 방대한 데이터, Android 기기를 통한 글로벌 배포 채널, YouTube의 영상 데이터, 지메일의 이메일 데이터를 모두 보유하고 있다. 이 모든 것이 AI 훈련의 재료다.

Gemini (제미나이) — 구글의 AI 대반격

ChatGPT가 세상을 뒤흔들자 구글이 "우리도 있어!"라며 꺼낸 카드가 바로 Gemini임. 사실 AI 기초 기술은 구글이 원조인데, 정작 AI 챗봇 전쟁에서는 뒤처지는 아이러니한 상황이었음.

Bard의 흑역사부터 시작

ChatGPT 대항마로 급하게 출시한 Bard(바드)가 첫 시연에서 오답 내서 구글 주가가 하루 9% 폭락하는 사건이 있었음. 완전 흑역사. 그래서 2024년에 Bard를 Gemini로 리브랜딩하고 새 출발한 거임.

진짜 강점: 구글 생태계 연동

Gemini가 GPT나 Claude보다 확실히 센 게 있음. 지메일, 구글 독스, 유튜브, 구글 검색 — 구글 서비스 전체에 박혀 있거든.

지메일에서 "이 이메일 요약해줘"라거나 구글 독스에서 "이 문서 고쳐줘"가 바로 됨. 외부 AI 쓸 때처럼 복사-붙여넣기 할 필요 없음.

버전별 특징

Ultra: 최강 성능, GPT-4랑 맞짱 레벨
Pro: 일반 사용자용, 구글 서비스에 내장됨
Flash: 빠르고 저렴, 개발자들이 API로 많이 씀
Nano: 인터넷 없이 스마트폰에서 바로 동작

갤럭시 S24에 Gemini Nano가 들어가서 전화 중에 실시간으로 사기 전화 감지해주는 기능도 생겼음.

100만 토큰 컨텍스트

Gemini 1.5 Pro는 100만 토큰(책 750권 분량)을 한 번에 처리할 수 있음. 영화 1시간짜리 전체를 분석하는 것도 가능. 이건 진짜 레전드급 스펙임.

논란

이미지 생성할 때 역사적 인물을 막 다양한 인종으로 만들어서 "나치 독일군이 흑인이야??" 하는 논란 있었음. 구글이 다양성을 과하게 주입하다 생긴 문제. 기능을 일시 중단하고 수정했음.

Gemini (제미나이) — 구글의 AI 친구

제미나이는 세계에서 제일 유명한 검색 엔진 구글이 만든 AI예요. 유튜브, 지메일을 만든 그 구글이 맞아요!

제미나이가 특별한 이유

제미나이는 글만 이해하는 게 아니에요. 사진을 보여주면 "이 사진에 강아지가 있네요!"라고 말하고, 음성으로 말해도 알아들어요. 마치 눈, 귀, 입이 모두 있는 AI 친구 같아요. 과학자들은 이것을 "멀티모달"이라고 불러요. 여러 가지 방식으로 이해한다는 뜻이에요.

구글 서비스와 친구들

구글 지메일에서 "이 편지 요약해줘"라고 하면 제미나이가 바로 해줘요. 구글 문서에서 글을 쓸 때도 도와준답니다. 구글이 만든 제품들이 모두 제미나이 친구인 셈이에요. 마치 학교에서 여러 과목 선생님들이 모두 제미나이 선생님을 돕는 것처럼요.

스마트폰 속 제미나이

어떤 스마트폰에는 제미나이가 안에 들어있어요. 인터넷이 없어도 폰 안에서 바로 작동할 수 있답니다. 마치 폰 안에 사는 작은 AI 친구! 전화할 때 나쁜 사람이 속이려고 하면 제미나이가 "이상한 전화예요!"라고 알려줘요.

Bard에서 Gemini로

구글은 처음에 Bard(바드)라는 AI를 만들었는데, 새롭게 더 잘 만들어서 이름을 Gemini(제미나이)로 바꿨어요. 마치 새 학년에 더 열심히 공부해서 성적이 올라간 것처럼요.

더 알아보기

구글은 어떻게 그렇게 많은 것을 알까요? 사실 구글은 전 세계 수십억 개의 웹페이지를 모아서 AI에게 가르쳤거든요. 도서관보다 훨씬 더 큰 도서관인 셈이에요! 앞으로 제미나이가 어떻게 더 똑똑해질지 기대해봐요.

Google's Gemini: A Multimodal Leap Forward

Gemini (Google AI Model)

Despite Google's late entry into the AI chatbot race, its formidable presence in search engines, YouTube, Gmail, Google Docs, and Android positions it as a formidable contender armed with AI as a powerful weapon. The question remains: what impact will this have on the global internet landscape? Geminii heralds a significant response.

1. Genesis: Bard's Setbacks and Rebirth

Google pioneered language model technology long before ChatGPT's emergence, leading advancements like the Transformer architecture and groundbreaking models like BERT, T5, and PaLM. Essentially, foundational GPT technology originated from Google's research.

This pioneering spirit faltered unexpectedly with the launch of ChatGPT in late 2022. Google internally declared a "Code Red" alert, swiftly deploying Bard as a countermeasure. However, Bard's initial demonstration suffered a critical misstep: confidently providing incorrect information about the James Webb Space Telescope during a promotional video, triggering a 9% stock plunge and highlighting a broader AI trustworthiness crisis.

In response, Google phased out Bard by December 2023, unveiling Geminii as its successor, rebranding Bard as Geminii by February 2024.

2. The Multimodal Monarch: Text, Image, Audio, Video Convergence

Geminii's defining advantage lies in its inherent multimodal design from inception. Unlike subsequent additions like GPT-4V's vision capabilities or Claude's, Geminii seamlessly integrates text, image, audio, and video processing from the outset. Demonstrations showcased its ability to interpret sketches, engage in real-time gameplay, and respond dynamically to spoken commands, underscoring its ambitious direction despite some editing controversies surrounding early releases.

At Google I/O 2024, Geminii Nano's integration into Samsung Galaxy S24 highlighted its practical applications, demonstrating real-time call analysis and fraud detection capabilities offline, free from internet dependence.

3. Version Spectrum: Ultra, Pro, Flash, Nano

Geminii Ultra: The pinnacle of performance, initially touted to surpass GPT-4, though user experiences have been mixed.
Geminii Pro: A mid-range option deeply integrated into Google services like Gmail, Docs, and Search, catering to everyday users.
Geminii Flash: Optimized for speed and cost-efficiency, favored by API developers.
Geminii Nano: A compact model designed for on-device execution, enabling offline functionality on smartphones.

4. Seamless Integration with Google Ecosystem: Unrivaled Strength

Geminii's true strength lies in its seamless integration within the Google ecosystem. Utilizing Geminii within Gmail for email summarization and response generation, enhancing Docs with comprehensive document analysis and editing suggestions, and incorporating an "AI Overview" feature into Google Search exemplifies this synergy. Notably, its compatibility with Google Drive allows access to contextual data spanning years, a significant advantage over competitors like Claude or GPT, which necessitate external document uploads. This data richness constitutes Google's strategic "moat" in the AI landscape.

5. Geminii 1.5: A Million Token Leap

Launched in February 2024, Geminii 1.5 Pro shattered records with its impressive 1 million token context window—equivalent to approximately 750 books—significantly surpassing Claude 2's 100k tokens. Demonstrations showcased its capability to analyze full-hour videos, identify specific scenes, extract storylines from movies, and process extensive codebases, marking a substantial leap forward in AI capabilities.

6. Controversies and Critiques

Image Bias Debates (February 2024): Geminii's Imagen image generation feature sparked controversy due to racially misrepresentative depictions of historical figures, generating widespread backlash online. Google promptly suspended the feature while addressing concerns about "overzealous diversity" and historical accuracy distortions.
Trust Erosion Post-Bard Glitch: The initial Bard demonstration error continues to cast a shadow over Google's AI trustworthiness, necessitating ongoing efforts to rebuild confidence.
AI Overview Errors in Google Search (May 2024): Several instances of nonsensical responses from Google Search's AI Overview feature surfaced, highlighting persistent challenges in refining AI outputs.

7. Strategic Imperative: Google's AI-First Vision

Google CEO Sundar Pichai has declared Google an "AI-First" company, positioning Geminii at its core. The strategic aim is to embed Geminii across all Google products, leveraging the company's unparalleled data assets from search traffic, Android deployments, YouTube content, and Gmail communications—resources invaluable for AI training.

The apprehension among competitors stems from Google's comprehensive data ecosystem, providing a formidable advantage in refining and deploying advanced AI solutions like Geminii.

Related Topics

Claude (Anthropic AI Model)
LLM (Large Language Model)
DeepSeek (Chinese AI Startup)
Multimodal AI
Google Search
ChatGPT
Transformer
AI Safety
Imagen
On-Device AI

English version not yet available.

Gemini (Google AI 모델)