电子游戏数据分析报告
1.项目简介
1.1数据描述
(1)数据来源
本次数据主要来源于老师所给的电子游戏.csv,主要描述了全球市场上电子游戏的销售情况。
电子游戏(Video Games,少部分学者使用Electronic Games)又称电玩游戏(简称电玩),是指所有依托于电子设备平台而运行的交互游戏。根据运行媒介的不同分为五类:主机游戏(狭义的,此处专指家用机游戏)、掌机游戏、街机游戏、电脑游戏及手机游戏。完善的电子游戏在20世纪末出现,改变了人类进行游戏的行为方式和对游戏一词的定义,属于一种随科技发展而诞生的文化活动。
电子游戏也可代指“电子游戏软件”。(来源于百度百科)
(2)数据描述
首先将数据集导入,然后我们通过一些简单的程序来对数据集有一个宏观的了解。
#导入一些我们所需要的包
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import statsmodels.api as sm
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
#导入数据集,此处我们将原先的电子游戏.csv的名字更改为work
data = pd.read_csv('work.csv')
data.head()#读取数据内容,显示五个数据
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
查看数据的基本情况
data.describe(include = 'object').T
count | unique | top | freq | |
Name | 16598 | 11493 | Need for Speed: Most Wanted | 12 |
Platform | 16598 | 31 | DS | 2163 |
Genre | 16598 | 12 | Action | 3316 |
Publisher | 16540 | 578 | Electronic Arts | 1351 |
- 一共有31个游戏平台,12个游戏类型
- 578个发行商
各字段含义如下
- Rank 总销量排名)
- Name (游戏名称)
- Platform (游戏平台)
- Year (游戏发行时间)
- Genre(游戏;类型)
- Publisher (游戏发行商)
- NA_Sales (北美销量)
- EU_Sales (欧洲销量)
- JP_Sales (日本销量)
- Other_Sales (世界其他地方销量)
- Global_Sales (全球总销量)
1.2分析背景及目的
- 用户比较喜欢的游戏类型是什么
- 用户比较常用的游戏平台是什么
- 销量比较好的发行商有哪些
- 电子游戏的销售行情如何
- 预测未来发布游戏的销量
1.3分析思路
通过一些数据可视化手段来分析
2.数据预处理
2.1数据清洗
print('数据集中是否存在重复观测:\n',any(data.duplicated()))
print('数据集中是否存在缺失值:\n',any(data.isnull()))
数据集中是否存在重复观测:
False
数据集中是否存在缺失值:
True
由于不存在重复观测,因而我们不需要处理重复项
由于存在缺失值,所以我们直接删除缺失字段
data.dropna()#删除掉含缺失值的项
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16593 | 16596 | Woody Woodpecker in Crazy Castle 5 | GBA | 2002.0 | Platform | Kemco | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16594 | 16597 | Men in Black II: Alien Escape | GC | 2003.0 | Shooter | Infogrames | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16595 | 16598 | SCORE International Baja 1000: The Official Game | PS2 | 2008.0 | Racing | Activision | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
16596 | 16599 | Know How 2 | DS | 2010.0 | Puzzle | 7G//AMES | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 |
16597 | 16600 | Spirits & Spells | GBA | 2003.0 | Platform | Wanadoo | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16291 rows × 11 columns
数据中共有多少游戏类型
#游戏类型
data.Genre.unique()
array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc',
'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure',
'Strategy'], dtype=object)
数据中共有多少游戏平台
#游戏平台
data.Platform.unique()
array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA',
'3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC',
'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16',
'3DO', 'GG', 'PCFX'], dtype=object)
游戏中共有多少发行商
#游戏发行商
data.Publisher.unique()
array(['Nintendo', 'Microsoft Game Studios', 'Take-Two Interactive',
'Sony Computer Entertainment', 'Activision', 'Ubisoft',
'Bethesda Softworks', 'Electronic Arts', 'Sega', 'SquareSoft',
'Atari', '505 Games', 'Capcom', 'GT Interactive',
'Konami Digital Entertainment',
'Sony Computer Entertainment Europe', 'Square Enix', 'LucasArts',
'Virgin Interactive', 'Warner Bros. Interactive Entertainment',
'Universal Interactive', 'Eidos Interactive', 'RedOctane',
'Vivendi Games', 'Enix Corporation', 'Namco Bandai Games',
'Palcom', 'Hasbro Interactive', 'THQ', 'Fox Interactive',
'Acclaim Entertainment', 'MTV Games', 'Disney Interactive Studios',
nan, 'Majesco Entertainment', 'Codemasters', 'Red Orb', 'Level 5',
'Arena Entertainment', 'Midway Games', 'JVC', 'Deep Silver',
'989 Studios', 'NCSoft', 'UEP Systems', 'Parker Bros.', 'Maxis',
'Imagic', 'Tecmo Koei', 'Valve Software', 'ASCII Entertainment',
'Mindscape', 'Infogrames', 'Unknown', 'Square', 'Valve',
'Activision Value', 'Banpresto', 'D3Publisher',
'Oxygen Interactive', 'Red Storm Entertainment', 'Video System',
'Hello Games', 'Global Star', 'Gotham Games', 'Westwood Studios',
'GungHo', 'Crave Entertainment', 'Hudson Soft', 'Coleco',
'Rising Star Games', 'Atlus', 'TDK Mediactive', 'ASC Games',
'Zoo Games', 'Accolade', 'Sony Online Entertainment', '3DO', 'RTL',
'Natsume', 'Focus Home Interactive', 'Alchemist',
'Black Label Games', 'SouthPeak Games', 'Mastertronic', 'Ocean',
'Zoo Digital Publishing', 'Psygnosis', 'City Interactive',
'Empire Interactive', 'Success', 'Compile', 'Russel', 'Taito',
'Agetec', 'GSP', 'Microprose', 'Play It', 'Slightly Mad Studios',
'Tomy Corporation', 'Sammy Corporation', 'Koch Media',
'Game Factory', 'Titus', 'Marvelous Entertainment', 'Genki',
'Mojang', 'Pinnacle', 'CTO SpA', 'TalonSoft', 'Crystal Dynamics',
'SCi', 'Quelle', 'mixi, Inc', 'Rage Software', 'Ubisoft Annecy',
'Scholastic Inc.', 'Interplay', 'Mystique', 'ChunSoft',
'Square EA', '20th Century Fox Video Games', 'Avanquest Software',
'Hudson Entertainment', 'Nordic Games', 'Men-A-Vision', 'Nobilis',
'Big Ben Interactive', 'Touchstone', 'Spike', 'Jester Interactive',
'Nippon Ichi Software', 'LEGO Media', 'Quest',
'Illusion Softworks', 'Tigervision', 'Funbox Media',
'Rocket Company', 'Metro 3D', 'Mattel Interactive', 'IE Institute',
'Rondomedia', 'Sony Computer Entertainment America',
'Universal Gamex', 'Ghostlight', 'Wizard Video Games',
'BMG Interactive Entertainment', 'PQube', 'Trion Worlds', 'Laguna',
'Ignition Entertainment', 'Takara', 'Kadokawa Shoten', 'Destineer',
'Enterbrain', 'Xseed Games', 'Imagineer',
'System 3 Arcade Software', 'CPG Products', 'Aruze Corp',
'Gamebridge', 'Midas Interactive Entertainment', 'Jaleco',
'Answer Software', 'XS Games', 'Activision Blizzard',
'Pack In Soft', 'Rebellion', 'Xplosiv', 'Ultravision',
'GameMill Entertainment', 'Wanadoo', 'NovaLogic', 'Telltale Games',
'Epoch', 'BAM! Entertainment', 'Knowledge Adventure', 'Mastiff',
'Tetris Online', 'Harmonix Music Systems', 'ESP', 'TYO',
'Telegames', 'Mud Duck Productions', 'Screenlife', 'Pioneer LDC',
'Magical Company', 'Mentor Interactive', 'Kemco',
'Human Entertainment', 'Avanquest', 'Data Age',
'Electronic Arts Victor', 'Black Bean Games', 'Jack of All Games',
'989 Sports', 'Takara Tomy', 'Media Rings', 'Elf', 'Kalypso Media',
'Starfish', 'Zushi Games', 'Jorudan', 'Destination Software, Inc',
'New', 'Brash Entertainment', 'ITT Family Games', 'PopCap Games',
'Home Entertainment Suppliers', 'Ackkstudios', 'Starpath Corp.',
'P2 Games', 'BPS', 'Gathering of Developers', 'NewKidCo',
'Storm City Games', 'CokeM Interactive', 'CBS Electronics',
'Magix', 'Marvelous Interactive', 'Nihon Falcom Corporation',
'Wargaming.net', 'Angel Studios', 'Arc System Works', 'Playmates',
'SNK Playmore', 'Hamster Corporation', 'From Software',
'Nippon Columbia', 'Nichibutsu', 'Little Orbit',
'Conspiracy Entertainment', 'DTP Entertainment', 'Hect',
'Mumbo Jumbo', 'Pacific Century Cyber Works', 'Indie Games',
'Liquid Games', 'NEC', 'Axela', 'ArtDink', 'Sunsoft', 'Gust',
'SNK', 'NEC Interchannel', 'FuRyu', 'Xing Entertainment',
'ValuSoft', 'Victor Interactive', 'Detn8 Games',
'American Softworks', 'Nordcurrent', 'Bomb', 'Falcom Corporation',
'AQ Interactive', 'CCP', 'Milestone S.r.l.', 'Sears',
'JoWood Productions', 'Seta Corporation', 'On Demand', 'NCS',
'Aspyr', 'Gremlin Interactive Ltd', 'Agatsuma Entertainment',
'Compile Heart', 'Culture Brain', 'Mad Catz', 'Shogakukan',
'Merscom LLC', 'Rebellion Developments', 'Nippon Telenet',
'TDK Core', 'bitComposer Games', 'Foreign Media Games', 'Astragon',
'SSI', 'Kadokawa Games', 'Idea Factory',
'Performance Designed Products', 'Asylum Entertainment',
'Core Design Ltd.', 'PlayV', 'UFO Interactive',
'Idea Factory International', 'Playlogic Game Factory',
'Essential Games', 'Adeline Software', 'Funcom',
'Panther Software', 'Blast! Entertainment Ltd', 'Game Life',
'DSI Games', 'Avalon Interactive', 'Popcorn Arcade',
'Neko Entertainment', 'Vir2L Studios', 'Aques', 'Syscom',
'White Park Bay Software', 'System 3', 'Vatical Entertainment',
'Daedalic', 'EA Games', 'Media Factory', 'Vic Tokai',
'The Adventure Company', 'Game Arts', 'Broccoli', 'Acquire',
'General Entertainment', 'Excalibur Publishing', 'Imadio',
'Swing! Entertainment', 'Sony Music Entertainment', 'Aqua Plus',
'Paradox Interactive', 'Hip Interactive',
'DreamCatcher Interactive', 'Tripwire Interactive', 'Sting',
'Yacht Club Games', 'SCS Software', 'Bigben Interactive',
'Havas Interactive', 'Slitherine Software', 'Graffiti', 'Funsta',
'Telstar', 'U.S. Gold', 'DreamWorks Interactive',
'Data Design Interactive', 'MTO', 'DHM Interactive', 'FunSoft',
'SPS', 'Bohemia Interactive', 'Reef Entertainment',
'Tru Blu Entertainment', 'Moss', 'T&E Soft', 'O-Games',
'Aksys Games', 'NDA Productions', 'Data East',
'Time Warner Interactive', 'Gainax Network Systems', 'Daito',
'O3 Entertainment', 'Gameloft', 'Xicat Interactive',
'Simon & Schuster Interactive', 'Valcon Games', 'PopTop Software',
'TOHO', 'HMH Interactive', '5pb', 'Cave',
'CDV Software Entertainment', 'Microids', 'PM Studios', 'Paon',
'Micro Cabin', 'GameTek', 'Benesse', 'Type-Moon',
'Enjoy Gaming ltd.', 'Asmik Corp', 'Interplay Productions',
'Asmik Ace Entertainment', 'inXile Entertainment', 'Image Epoch',
'Phantom EFX', 'Evolved Games', 'responDESIGN',
'Culture Publishers', 'Griffin International', 'Hackberry',
'Hearty Robin', 'Nippon Amuse', 'Origin Systems', 'Seventh Chord',
'Mitsui', 'Milestone', 'Abylight', 'Flight-Plan', 'Glams', 'Locus',
'Warp', 'Daedalic Entertainment', 'Alternative Software',
'Myelin Media', 'Mercury Games', 'Irem Software Engineering',
'Sunrise Interactive', 'Elite', 'Evolution Games', 'Tivola',
'Global A Entertainment', 'Edia', 'Athena', 'Aria', 'Gamecock',
'Tommo', 'Altron', 'Happinet', 'iWin', 'Media Works', 'Fortyfive',
'Revolution Software', 'Imax', 'Crimson Cow', '10TACLE Studios',
'Groove Games', 'Pack-In-Video', 'Insomniac Games',
'Ascaron Entertainment GmbH', 'Asgard', 'Ecole', 'Yumedia',
'Phenomedia', 'HAL Laboratory', 'Grand Prix Games', 'DigiCube',
'Creative Core', 'Kaga Create', 'WayForward Technologies',
'LSP Games', 'ASCII Media Works', 'Coconuts Japan', 'Arika',
'Ertain', 'Marvel Entertainment', 'Prototype',
'TopWare Interactive', 'Phantagram', '1C Company',
'The Learning Company', 'TechnoSoft', 'Vap', 'Misawa', 'Tradewest',
'Team17 Software', 'Yeti', 'Pow', 'Navarre Corp', 'MediaQuest',
'Max Five', 'Comfort', 'Monte Christo Multimedia', 'Pony Canyon',
'Riverhillsoft', 'Summitsoft', 'Milestone S.r.l', 'Playmore',
'MLB.com', 'Kool Kizz', 'Flashpoint Games', '49Games',
'Legacy Interactive', 'Alawar Entertainment', 'CyberFront',
'Cloud Imperium Games Corporation', 'Societa',
'Virtual Play Games', 'Interchannel', 'Sonnet', 'Experience Inc.',
'Zenrin', 'Iceberg Interactive', 'Ivolgamus', '2D Boy',
'MC2 Entertainment', 'Kando Games', 'Just Flight', 'Office Create',
'Mamba Games', 'Fields', 'Princess Soft', 'Maximum Family Games',
'Berkeley', 'Fuji', 'Dusenberry Martin Racing', 'imageepoch Inc.',
'Big Fish Games', 'Her Interactive', 'Kamui', 'ASK',
'Headup Games', 'KSS', 'Cygames', 'KID', 'Quinrose', 'Sunflowers',
'dramatic create', 'TGL', 'Encore', 'Extreme Entertainment Group',
'Intergrow', 'G.Rev', 'Sweets', 'Kokopeli Digital Studios',
'Number None', 'Nexon', 'id Software', 'BushiRoad', 'Tryfirst',
'Strategy First', '7G//AMES', 'GN Software', "Yuke's",
'Easy Interactive', 'Licensed 4U', 'FuRyu Corporation',
'Lexicon Entertainment', 'Paon Corporation', 'Kids Station', 'GOA',
'Graphsim Entertainment', 'King Records', 'Introversion Software',
'Minato Station', 'Devolver Digital', 'Blue Byte', 'Gaga',
'Yamasa Entertainment', 'Plenty', 'Views', 'fonfun', 'NetRevo',
'Codemasters Online', 'Quintet', 'Phoenix Games', 'Dorart',
'Marvelous Games', 'Focus Multimedia', 'Imageworks',
'Karin Entertainment', 'Aerosoft', 'Technos Japan Corporation',
'Gakken', 'Mirai Shounen', 'Datam Polystar', 'Saurus', 'HuneX',
'Revolution (Japan)', 'Giza10', 'Visco', 'Alvion', 'Mycom', 'Giga',
'Warashi', 'System Soft', 'Sold Out', 'Lighthouse Interactive',
'Masque Publishing', 'RED Entertainment', 'Michaelsoft',
'Media Entertainment', 'New World Computing', 'Genterprise',
'Interworks Unlimited, Inc.', 'Boost On', 'Stainless Games',
'EON Digital Entertainment', 'Epic Games', 'Naxat Soft',
'Ascaron Entertainment', 'Piacci', 'Nitroplus',
'Paradox Development', 'Otomate', 'Ongakukan', 'Commseed',
'Inti Creates', 'Takuyo', 'Interchannel-Holon', 'Rain Games',
'UIG Entertainment'], dtype=object)
3.数据分析
3.1各游戏题材的销量排名(前5)
#总销量排名前五(sports)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Sports',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Wii Sports | 82.74 |
Wii Sports Resort | 33.00 |
Wii Fit | 22.72 |
Wii Fit Plus | 22.00 |
FIFA 15 | 19.02 |
运动题材中,我们可以发现Wii Sports在全球的总销量遥遥领先,广受玩家的喜爱,而作为其续作的Wii Sports Resort也取得了不俗的销量
#总销量排名前五(Platform)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Platform',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Super Mario Bros. | 45.31 |
New Super Mario Bros. | 30.01 |
New Super Mario Bros. Wii | 28.62 |
Super Mario World | 26.07 |
Super Mario Bros. 3 | 22.48 |
Super Mario系列包揽了Platform游戏的前五名
#总销量排名前五(Racing)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Racing',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Mario Kart Wii | 35.82 |
Mario Kart DS | 23.42 |
Gran Turismo 3: A-Spec | 14.98 |
Need for Speed: Most Wanted | 14.08 |
Mario Kart 7 | 12.21 |
Mario Kart系列分别排在第一,第二,第五的位置
#总销量排名前五(Role-Playing)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Role-Playing',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Pokemon Red/Pokemon Blue | 31.37 |
Pokemon Gold/Pokemon Silver | 23.10 |
The Elder Scrolls V: Skyrim | 19.28 |
Pokemon Diamond/Pokemon Pearl | 18.36 |
Pokemon Ruby/Pokemon Sapphire | 15.85 |
排名前五基本被Pokemon占领
#总销量排名前五(Puzzle)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Puzzle',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Tetris | 35.84 |
Brain Age 2: More Training in Minutes a Day | 15.30 |
Dr. Mario | 10.19 |
Pac-Man | 9.03 |
Professor Layton and the Curious Village | 5.26 |
#总销量排名前五(Misc)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Misc',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Wii Play | 29.02 |
Minecraft | 23.73 |
Kinect Adventures! | 21.82 |
Brain Age: Train Your Brain in Minutes a Day | 20.22 |
Guitar Hero III: Legends of Rock | 16.40 |
#总销量排名前五(Shooter)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Shooter',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Call of Duty: Black Ops | 31.03 |
Call of Duty: Modern Warfare 3 | 30.83 |
Call of Duty: Black Ops II | 29.72 |
Duck Hunt | 28.31 |
Call of Duty: Ghosts | 27.38 |
#总销量排名前五(Simulation)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Simulation',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Nintendogs | 24.76 |
The Sims 3 | 15.45 |
Animal Crossing: Wild World | 12.27 |
Animal Crossing: New Leaf | 9.09 |
Cooking Mama | 5.72 |
#总销量排名前五(Action)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Action',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Grand Theft Auto V | 55.92 |
Grand Theft Auto: San Andreas | 23.86 |
Grand Theft Auto IV | 22.47 |
Grand Theft Auto: Vice City | 16.19 |
FIFA Soccer 13 | 16.16 |
#总销量排名前五(Fighting)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Fighting',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Super Smash Bros. Brawl | 13.04 |
Super Smash Bros. for Wii U and 3DS | 12.47 |
Mortal Kombat | 8.40 |
WWE SmackDown vs Raw 2008 | 7.41 |
Street Fighter IV | 7.27 |
#总销量排名前五(Adventure)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Adventure',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Assassin's Creed | 11.30 |
Super Mario Land 2: 6 Golden Coins | 11.18 |
L.A. Noire | 5.95 |
Zelda II: The Adventure of Link | 4.38 |
Rugrats: Search For Reptar | 3.34 |
#总销量排名前五(Strategy)
data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc['Strategy',:].sort_values(by='Global_Sales',ascending=False).head()
Global_Sales | |
Name | |
Pokemon Stadium | 5.45 |
Warzone 2100 | 5.01 |
StarCraft II: Wings of Liberty | 4.83 |
Warcraft II: Tides of Darkness | 4.21 |
Pokémon Trading Card Game | 3.70 |
我们可以清晰地筛选出各个类型游戏中的优秀作品,以及玩家的喜好方向,为后续游戏的开发与改进提供参考,并且我们还发现了基本每个类型的游戏都会由一个系列的游戏来引领,这是不是说明如果厂商在刚开始新开发一款游戏的时候,就要为续作进行思考和铺垫呢?
3.2各地区销量排名前五的游戏
#欧洲销量前五
data.pivot_table(index=['Genre','Name'],values='EU_Sales',aggfunc='sum').sort_values(by='EU_Sales',ascending=False).head()
EU_Sales | ||
Genre | Name | |
Sports | Wii Sports | 29.02 |
Action | Grand Theft Auto V | 23.04 |
Racing | Mario Kart Wii | 12.88 |
Sports | FIFA 15 | 12.40 |
Shooter | Call of Duty: Modern Warfare 3 | 11.29 |
欧洲销量较高的游戏主要是体育运动类游戏,我们大胆推测欧洲的体育风气较为良好
#北美销量排名前五
data.pivot_table(index=['Genre','Name'],values = ['NA_Sales'],aggfunc='sum').sort_values(by='NA_Sales',ascending=False).head(5)
NA_Sales | ||
Genre | Name | |
Sports | Wii Sports | 41.49 |
Platform | Super Mario Bros. | 32.48 |
Shooter | Duck Hunt | 26.93 |
Puzzle | Tetris | 26.17 |
Action | Grand Theft Auto V | 23.46 |
北美洲排在销量第一的也是体育运动类游戏
#日本销量前五
data.pivot_table(index=['Genre','Name'],values='JP_Sales',aggfunc='sum').sort_values(by='JP_Sales',ascending=False).head()
JP_Sales | ||
Genre | Name | |
Role-Playing | Pokemon Red/Pokemon Blue | 10.22 |
Pokemon Gold/Pokemon Silver | 7.20 | |
Platform | Super Mario Bros. | 6.96 |
New Super Mario Bros. | 6.50 | |
Role-Playing | Pokemon Diamond/Pokemon Pearl | 6.04 |
而在日本,角色扮演类的游戏似乎比较流行
#其他地区销量
data.pivot_table(index=['Genre','Name'],values='Other_Sales',aggfunc='sum').sort_values(by='Other_Sales',ascending=False).head()
Other_Sales | ||
Genre | Name | |
Action | Grand Theft Auto: San Andreas | 10.72 |
Sports | Wii Sports | 8.46 |
Action | Grand Theft Auto V | 8.03 |
Racing | Gran Turismo 4 | 7.53 |
Shooter | Call of Duty: Black Ops II | 3.88 |
而在世界的其他地区,动作类的游戏更受玩家的喜欢
对不同地区游戏的销量进行排序,我们可以发现各地区玩家的喜好,对游戏的市场投放和针对特定地区的游戏开发提供参考
#输出各游戏类型的前五名(以总销量为依据)
for genre in data.Genre.unique():
print(genre)
print(data.pivot_table(index=['Genre','Name'],values = 'Global_Sales',aggfunc='sum').loc[genre,:].sort_values(by='Global_Sales',ascending=False).head())
print('*'*40)
Sports
Global_Sales
Name
Wii Sports 82.74
Wii Sports Resort 33.00
Wii Fit 22.72
Wii Fit Plus 22.00
FIFA 15 19.02
****************************************
Platform
Global_Sales
Name
Super Mario Bros. 45.31
New Super Mario Bros. 30.01
New Super Mario Bros. Wii 28.62
Super Mario World 26.07
Super Mario Bros. 3 22.48
****************************************
Racing
Global_Sales
Name
Mario Kart Wii 35.82
Mario Kart DS 23.42
Gran Turismo 3: A-Spec 14.98
Need for Speed: Most Wanted 14.08
Mario Kart 7 12.21
****************************************
Role-Playing
Global_Sales
Name
Pokemon Red/Pokemon Blue 31.37
Pokemon Gold/Pokemon Silver 23.10
The Elder Scrolls V: Skyrim 19.28
Pokemon Diamond/Pokemon Pearl 18.36
Pokemon Ruby/Pokemon Sapphire 15.85
****************************************
Puzzle
Global_Sales
Name
Tetris 35.84
Brain Age 2: More Training in Minutes a Day 15.30
Dr. Mario 10.19
Pac-Man 9.03
Professor Layton and the Curious Village 5.26
****************************************
Misc
Global_Sales
Name
Wii Play 29.02
Minecraft 23.73
Kinect Adventures! 21.82
Brain Age: Train Your Brain in Minutes a Day 20.22
Guitar Hero III: Legends of Rock 16.40
****************************************
Shooter
Global_Sales
Name
Call of Duty: Black Ops 31.03
Call of Duty: Modern Warfare 3 30.83
Call of Duty: Black Ops II 29.72
Duck Hunt 28.31
Call of Duty: Ghosts 27.38
****************************************
Simulation
Global_Sales
Name
Nintendogs 24.76
The Sims 3 15.45
Animal Crossing: Wild World 12.27
Animal Crossing: New Leaf 9.09
Cooking Mama 5.72
****************************************
Action
Global_Sales
Name
Grand Theft Auto V 55.92
Grand Theft Auto: San Andreas 23.86
Grand Theft Auto IV 22.47
Grand Theft Auto: Vice City 16.19
FIFA Soccer 13 16.16
****************************************
Fighting
Global_Sales
Name
Super Smash Bros. Brawl 13.04
Super Smash Bros. for Wii U and 3DS 12.47
Mortal Kombat 8.40
WWE SmackDown vs Raw 2008 7.41
Street Fighter IV 7.27
****************************************
Adventure
Global_Sales
Name
Assassin's Creed 11.30
Super Mario Land 2: 6 Golden Coins 11.18
L.A. Noire 5.95
Zelda II: The Adventure of Link 4.38
Rugrats: Search For Reptar 3.34
****************************************
Strategy
Global_Sales
Name
Pokemon Stadium 5.45
Warzone 2100 5.01
StarCraft II: Wings of Liberty 4.83
Warcraft II: Tides of Darkness 4.21
Pokémon Trading Card Game 3.70
****************************************
我们从中可以发现,运动类的游戏销量最高而策略类的游戏销量较低
3.2 各题材前五的发行商
for genre in data.Genre.unique():
print(genre)
print(data.pivot_table(index=['Genre','Publisher'],values='Global_Sales',aggfunc='sum').loc[genre,:].sort_values('Global_Sales',ascending=False).head())
print('*'*40)
Sports
Global_Sales
Publisher
Electronic Arts 479.67
Nintendo 218.01
Konami Digital Entertainment 98.95
Take-Two Interactive 76.77
Activision 75.91
****************************************
Platform
Global_Sales
Publisher
Nintendo 427.21
Sony Computer Entertainment 104.06
Sega 60.84
THQ 41.02
Activision 33.40
****************************************
Racing
Global_Sales
Publisher
Nintendo 151.30
Electronic Arts 145.77
Sony Computer Entertainment 110.57
THQ 40.17
Codemasters 34.62
****************************************
Role-Playing
Global_Sales
Publisher
Nintendo 284.90
Square Enix 97.09
Bethesda Softworks 54.16
Namco Bandai Games 53.82
SquareSoft 52.59
****************************************
Puzzle
Global_Sales
Publisher
Nintendo 124.88
Atari 21.59
THQ 9.25
Warner Bros. Interactive Entertainment 6.65
Hudson Soft 6.61
****************************************
Misc
Global_Sales
Publisher
Nintendo 180.67
Ubisoft 97.53
Sony Computer Entertainment 80.80
Activision 76.55
Microsoft Game Studios 46.99
****************************************
Shooter
Global_Sales
Publisher
Activision 299.87
Electronic Arts 158.26
Microsoft Game Studios 95.46
Nintendo 69.73
Ubisoft 67.65
****************************************
Simulation
Global_Sales
Publisher
Electronic Arts 89.53
Nintendo 85.27
Ubisoft 44.67
Konami Digital Entertainment 32.31
505 Games 22.24
****************************************
Action
Global_Sales
Publisher
Take-Two Interactive 211.08
Ubisoft 142.94
Activision 142.33
Nintendo 128.18
Warner Bros. Interactive Entertainment 118.24
****************************************
Fighting
Global_Sales
Publisher
THQ 72.86
Namco Bandai Games 61.22
Nintendo 53.35
Capcom 33.01
Electronic Arts 31.39
****************************************
Adventure
Global_Sales
Publisher
Nintendo 35.71
Ubisoft 22.19
THQ 19.98
Disney Interactive Studios 17.76
Sony Computer Entertainment 13.55
****************************************
Strategy
Global_Sales
Publisher
Nintendo 27.35
Activision 17.70
Electronic Arts 14.08
Namco Bandai Games 11.83
Konami Digital Entertainment 10.99
****************************************
Nintendo作为发行巨头,在各个题材中都发行了众多销量众多,广受好评的作品 其余的发行商,例如Electronic Arts 、Sony Computer Entertainment、Activision均只在几个类型中销量领先,其实力与Nintendo存在着明显的差距
3.3 不同地区销售额的变化趋势
data.pivot_table(index ='Year',values=['NA_Sales','EU_Sales','JP_Sales','Other_Sales'],aggfunc='sum').plot(figsize=(10,6))
plt.grid()
plt.ylabel('/million')
Text(0, 0.5, '/million')
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5zM2pLp9-1608426267905)(output_67_1.png)]
早期游戏产业由于网络的普及度较差因而销售额较低。
游戏产业随着电脑以及互联网的普及而快速发展,从1995年左右开始了快速增长,05-10年发行的游戏销售额最高。
但从2008年开始,销量出现断崖式下跌。我们猜测是当年的经济危机使得其受到了影响。
不仅如此,我们还发现了北美地区的游戏市场较为广大,其销量位居首位。其次则是欧洲。
日本地区的销量则和其他地区的销量不相上下
因而我们建议游戏厂商在考虑玩家喜好的游戏类型时,应该着重考虑北美和欧洲地区的玩家,以制定合理的销售策略。
3.4 不同地区最受欢迎的游戏题材
data.pivot_table(index='Genre',values = ['NA_Sales','EU_Sales','JP_Sales','Other_Sales'],aggfunc='sum').plot.bar(figsize=(20,6))
fig,axes = plt.subplots(nrows=2,ncols=2,figsize=(20,6),sharex=True,sharey=True)
fig.tight_layout()
data.pivot_table(index='Genre',values=['NA_Sales'],aggfunc='sum').plot.bar(ax=axes[0][0])
data.pivot_table(index='Genre',values=['EU_Sales'],aggfunc='sum').plot.bar(ax=axes[0][1])
data.pivot_table(index='Genre',values=['JP_Sales'],aggfunc='sum').plot.bar(ax=axes[1][0])
data.pivot_table(index='Genre',values=['Other_Sales'],aggfunc='sum').plot.bar(ax=axes[1][1])
<matplotlib.axes._subplots.AxesSubplot at 0x1debf930eb0>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-d9UpznvF-1608426267909)(output_71_1.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QWJKFXmL-1608426267913)(output_71_2.png)]
- 从全球总销量来看,动作类游戏销量居榜首,运动类游戏与射击类游戏次之;
- 北美地区,动作类游戏最受欢迎,运动类游戏和射击类游戏次之;
- 欧洲地区,动作类游戏最受欢迎,运动类游戏和射击类游戏次之,但各种类游戏销量的差值较北美而言更小;
- 日本地区,角色扮演类游戏最受欢迎,动作类、运动类、平台类游戏次之;
- 世界其他地区,动作类游戏最受欢迎,运动类游戏和射击类游戏次之。
- 从游戏题材的角度来看,动作类游戏销量最好;
- 从不同地区来看,除了角色扮演类游戏在日本销量最好,其余题材游戏在北美的销售额均为最高。
3.5 不同地区最受欢迎的发行商
area = ['Global_Sales','NA_Sales','EU_Sales','JP_Sales','Other_Sales']
for area in area:
print(area)
print(data.pivot_table(index='Publisher',values=[area],aggfunc='sum').sort_values(area,ascending=False).head())
print('*'*40)
Global_Sales
Global_Sales
Publisher
Nintendo 1786.56
Electronic Arts 1110.32
Activision 727.46
Sony Computer Entertainment 607.50
Ubisoft 474.72
****************************************
NA_Sales
NA_Sales
Publisher
Nintendo 816.87
Electronic Arts 595.07
Activision 429.70
Sony Computer Entertainment 265.22
Ubisoft 253.43
****************************************
EU_Sales
EU_Sales
Publisher
Nintendo 418.74
Electronic Arts 371.27
Activision 215.53
Sony Computer Entertainment 187.72
Ubisoft 163.32
****************************************
JP_Sales
JP_Sales
Publisher
Nintendo 455.42
Namco Bandai Games 127.07
Konami Digital Entertainment 91.30
Sony Computer Entertainment 74.10
Capcom 68.08
****************************************
Other_Sales
Other_Sales
Publisher
Electronic Arts 129.77
Nintendo 95.33
Sony Computer Entertainment 80.45
Activision 75.34
Take-Two Interactive 55.24
****************************************
- 北美、欧洲、日本三个主要地区“任天堂”都占据了主要的份额。
- EA紧随其后,但总销售额差距接近7亿美元(表中单位为百万美元)
3.6 不同地区最受欢迎的游戏平台
data.pivot_table(index='Platform',values='Global_Sales',aggfunc='sum').sort_values('Global_Sales',ascending=False).plot.bar(figsize=(20,6))
#累计频率曲线
data.pivot_table(index='Platform',values='Global_Sales',aggfunc='sum').sort_values('Global_Sales',ascending=False).apply(lambda x:x.cumsum()/x.sum()).plot.bar(figsize=(20,6))
plt.hlines(y=0.8,xmin=-1,xmax=31,color='r')
area = ['NA_Sales','EU_Sales','JP_Sales','Other_Sales']
fig,ax = plt.subplots(nrows=2,ncols=2,figsize=(20,6),sharex=True,sharey=True)
fig.tight_layout()
for i in range(4):
data.pivot_table(index='Platform',values=area[i],aggfunc='sum').sort_values(area[i],ascending=False).plot.bar(ax=ax.ravel()[i])
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-q9ccymMy-1608426267918)(output_77_0.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-C16AfEgv-1608426267921)(output_77_1.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vfiOPdDk-1608426267922)(output_77_2.png)]
在各个地区,平台的销售量排名类似;
前6大平台占据了全球60%的份额;
前12大平台占据了全球80%的份额
3.7 各发行商在不同地区的总营收情况(以索尼电脑娱乐为例)
data.pivot_table(index=['Publisher'],values = ['Global_Sales','NA_Sales','EU_Sales','JP_Sales','Other_Sales'],aggfunc='sum').loc['Sony Computer Entertainment',:].sort_values().plot.bar(figsize=(10,6))
<matplotlib.axes._subplots.AxesSubplot at 0x1dec189af70>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VGSCLrlq-1608426267924)(output_80_1.png)]
其在北美和欧洲的销量远超日本和其他地区
3.8 在不同题材游戏上的营收情况(以索尼电脑娱乐为例)
data.pivot_table(index=['Publisher','Genre'],values='Global_Sales',aggfunc='sum').loc['Sony Computer Entertainment',:].sort_values('Global_Sales').plot.bar(figsize=(10,6))
<matplotlib.axes._subplots.AxesSubplot at 0x1dec1647fd0>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MHbWUa7e-1608426267928)(output_83_1.png)]
该厂商的Racing类型游戏较受玩家的喜欢
3.9 在不同平台上的营收情况(以索尼电脑娱乐为例)
data.pivot_table(index=['Publisher','Platform'],values='Global_Sales',aggfunc='sum').loc['Sony Computer Entertainment',:].sort_values('Global_Sales').plot.bar(figsize=(10,6))
<matplotlib.axes._subplots.AxesSubplot at 0x1dec16f8580>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SNBOKE6a-1608426267931)(output_86_1.png)]
该厂商的主要收入来源于PS2,PS3,PS5的游戏平台
4.数据预测
假设我们是任天堂,我们想在2021年发布一款名为“我大意了没有闪”的动作类游戏,我们想知道它的销量,应该如何预测呢?
在这里我们采用一元线性回归进行分析
data1 = data.loc[data.Publisher == 'Nintendo',['Year','Global_Sales','Genre']]
data2 = data1.loc[data1.Genre == 'Action',['Year','Global_Sales']]#前两步操作为取出任天堂动作类游戏销量数据
sns.distplot(data2['Global_Sales']);#查看其分布是否满足正态分布
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KjJDX6bT-1608426267933)(output_91_0.png)]
#我们来查看一下它的偏度值
print("Skewness: %f"% data2['Global_Sales'].skew())#正态分布的偏度值应为0
Skewness: 2.289748
#QQ图
fig = plt.figure()
res = stats.probplot(data2['Global_Sales'],plot = plt)
plt.show()
#越拟合直线时越满足正态分布
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GpvgHsdw-1608426267934)(output_93_0.png)]
由于其偏度值较大,所以我们对其进行变换以获得更为接近正态分布的数据
#对数变换
data2['Global_Sales'] = np.log1p(data2['Global_Sales'])
#查看新的分布
sns.distplot(data2['Global_Sales'],fit = norm);
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3blMeCYu-1608426267936)(output_95_0.png)]
#我们来查看一下它的偏度值
print("Skewness: %f"% data2['Global_Sales'].skew())
Skewness: 0.813188
我们能够发现其偏度值已经减小了
sns.lmplot(x ='Year',y = 'Global_Sales',data = data2)#我们进行一元线性回归模拟,能够发现直线还是较好的体现了趋势
<seaborn.axisgrid.FacetGrid at 0x1dec1702520>
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5hLA9RFD-1608426267937)(output_98_1.png)]
fit = sm.formula.ols('Global_Sales~Year',data = data2).fit()#计算一元线性回归的参数a,b
fit.params
Intercept 62.452880
Year -0.030753
dtype: float64
所以我们的线性回归模型为 Global_Sales1 = 35.099366 - -0.017246*Year
168.561454 - 0.083184 * 2021
0.4465900000000147
因而如果我们在2021年发布的《我大意了没有闪》这款动作类游戏的销量预计为0.44659