Develop a time-series plot to analyze your texting traffic and habits within the Apple ecosystem using Python and SQL.
制定时间序列图,以使用Python和SQL分析Apple生态系统内的短信流量和习惯。
(Introduction)
During winter break I came across two somewhat hidden features of my Mac that led me into a rabbit hole of data querying and analysis. I was able to access a hidden database of my text messages (i.e. iMessage) by using SQLite3 in terminal. This meant a lot for me; I could visualize and analyze my data to identify texting habits and hopefully learn something about myself.
寒假期间,我遇到了Mac的两个隐藏功能,这些功能使我陷入了数据查询和分析的困境。 通过在终端中使用SQLite3,我能够访问我的短信(即iMessage)的隐藏数据库。 这对我来说意义重大。 我可以可视化和分析我的数据以识别发短信的习惯,并希望从中学到一些自己的知识。
Below are the steps I took to get to the following graph, using SQLite3 and Python with the seaborn library. This was a fun learning experience and I recommend trying it out. Actual code files can be found on GitHub:
下面是使用SQLite3和Python和seaborn库获取下图的步骤。 这是一次有趣的学习经历,我建议您尝试一下。 实际的代码文件可以在GitHub上找到 :
Chat.db
聊天室
This article introduces the hidden chat database within macOS. There’s one issue though — the database is inaccessible as is. You have to use something else to read and manipulate the data. But this discovery means so much to someone who is learning data analysis.
本文介绍了macOS中的隐藏聊天数据库。 但是,有一个问题-数据库无法按原样访问。 您必须使用其他方法来读取和处理数据。 但是,这一发现对正在学习数据分析的人意义重大。
Location of chat.db in the finder
chat.db在查找器中的位置
This database holds metadata on every text that you’ve ever sent and received. It tells you the day and time (in UTC), if you sent it, the recipient of the text, etc. There’s so much information in there that can be used to understand how you interact with people. Before actually trying to analyze the data though, you have to retrieve the data.
该数据库保存您曾经发送和接收的每个文本的元数据。 它会告诉您日期和时间(以UTC为单位),是否发送了文本,文本的收件人等。其中有太多信息可用于了解您与人的互动方式。 在实际尝试分析数据之前,您必须先检索数据。
SQLite3 in Terminal
终端中SQLite3
The article linked above not only went through how to access the database, but also how to interact with it. The author used a combination of SQLite3 to pull the data and Python to create a Pandas dataframe.
上面链接的文章不仅介绍了如何访问数据库,而且还介绍了如何与之交互。 作者使用SQLite3组合来提取数据,并使用Python来创建Pandas数据框。
For this project, you’ll do something similar but look at your entire history.
对于此项目,您将执行类似的操作,但请查看您的整个历史记录。
设置一切 (Setting Everything Up)
The first thing to do is open up the terminal. Access the terminal by searching for “terminal” in your Mac by pressing CMD+ENTER together. Once you open it you’re ready to go.
首先要做的就是打开终端。 通过同时按CMD + ENTER在Mac中搜索“终端”来访问终端。 一旦打开它,就可以开始了。
Step 1: Navigate to the desired folder
步骤1:导航到所需的文件夹
In this tutorial we’ll work from the Desktop so you don’t need to know more than two commands. Start by making the desired folder your active directory. You can do this by using the following command:
在本教程中,我们将从桌面上进行工作,因此您不需要了解两个以上的命令。 首先将所需的文件夹设为活动目录。 您可以使用以下命令来执行此操作:
ls # lists visible files in the current directory
cd <folder>/<folder_within_folder> # moves you to the designated directory
cd .. # this moves back one folder
cd ~ # this moves back to the default folder
For the sake of this tutorial, you can simply use the command below. This will make it so that the produced spreadsheet will be saved to your Desktop.
为了本教程的缘故,您可以简单地使用下面的命令。 这样就可以将生成的电子表格保存到您的桌面。
cd ~/Desktop
Use the following command to list the files within your current folder.
使用以下命令列出当前文件夹中的文件。
ls
This article gives a nice introduction on terminal and how to navigate it — steps 2 and 5 give you all the information you’ll really need.
本文对终端及其导航方式进行了很好的介绍-步骤2和5为您提供了您真正需要的所有信息。
Once you’ve successfully navigated, change the active directory to the folder you want to work in, you can finally start accessing the database. We can access the database without having to move our active directory. We just need to run SQLite3 on a specific database object. We do this by using the following command:
成功导航后,将活动目录更改为要使用的文件夹,您终于可以开始访问数据库了。 我们可以访问数据库而不必移动我们的活动目录。 我们只需要在特定的数据库对象上运行SQLite3。 我们通过使用以下命令来做到这一点:
Step 2: Run SQLite
步骤2:执行SQLite
sqlite3 ~/Library/Messages/chat.db
NOTE: You might get an error like the one below that says you cannot open the database. Follow the steps below to fix this error.
注意:您可能会收到类似以下错误的错误消息,提示您无法打开数据库。 请按照以下步骤解决此错误。
If you get this error, give Terminal full disk access. You can do this by navigating to System Preferences > Security & Privacy > Privacy tab. Once in the privacy tab, find the Full Disk Access section and enable the checkbox for Terminal. You’ll need to unlock the lock in the bottom left to be able to make changes. Once you’ve checked the Terminal box, re-locked the lock, try running the code again.
如果出现此错误,请赋予终端完整的磁盘访问权限。 您可以通过导航至系统偏好设置>安全性和隐私>隐私标签来实现。 在“隐私”选项卡中,找到“全盘访问”部分,并启用“终端”复选框。 您需要解锁左下角的锁才能进行更改。 选中“终端”框后,重新锁定锁,然后尝试再次运行代码。
This command should be universal unless you’ve moved the home directory or the chat.db file location. If SQLite3 successfully started up, you can write the code to pull your data.
除非您已移动主目录或chat.db文件位置,否则此命令应该通用。 如果SQLite3成功启动,则可以编写代码以提取数据。
查询聊天 (Querying Your Chats)
In order to export the query to an excel document we have to prepare some code. To do this, write the following:
为了将查询导出到excel文档,我们必须准备一些代码。 为此,请编写以下代码:
Step 3: Set up the export
步骤3:设定汇出
.header on
.mode csv
.output texting_history.csv
-- If you would like to choose your own file name, here is the format
--.output <file_name>.csv
Replace <file_name> with whatever you want to name your file. The above code simply says to export the data to a .csv file with headers on. The file will be saved to whichever folder you ran the SQLite3 function from (for us that was the Desktop).
将<file_name>替换为要命名文件的名称。 上面的代码只是说要将数据导出到带有标头的.csv文件中。 该文件将保存到您从中运行SQLite3函数的任何文件夹中(对于我们来说是Desktop)。
Step 4: Run the query
步骤4:运行查询
Now copy this code into the terminal to execute thequery:
现在将此代码复制到终端以执行查询:
select count(rowid),
strftime('%Y',datetime(date/1000000000 +
strftime('%s','2001–01–01'), 'unixepoch','localtime')) as Year,
strftime('%m',datetime(date/1000000000 +
strftime('%s','2001–01–01'), 'unixepoch','localtime')) as Month,
is_from_me
from message
group by Year, Month, is_from_me;
The code above is selecting four pieces of information: a count of the text grouped by month, year, and recipient designation, aka if you sent or received it. It groups the data together so that the information given shows the number of texts sent, and received, every month that you’ve used iMessage.
上面的代码选择了四种信息:按月,年和收件人名称分组的文本计数,也就是您发送或接收的文本。 它将数据分组在一起,以便给定的信息显示您使用iMessage的每个月发送和接收的文本数。
Apple saves the dates and times as an integer value represented by the number of seconds after January 1st, 2001. In order to get the time formatted correctly, use the strftime() and datetime() functions. Read up on strftime() and datetime() here.
Apple将日期和时间保存为一个整数值,该整数值由2001年1月1日之后的秒数表示。为了正确格式化时间,请使用strftime()和datetime()函数。 在此处阅读strftime()和datetime() 。
By running this code, you get the excel document with the number of texts that you sent and received by month for every year that you’ve used iMessage.
通过运行此代码,您将获得excel文档,其中包含使用iMessage的年份每年按月发送和接收的文本数。
Your terminal and Desktop should look something like this:
您的终端和桌面应如下所示:
(Visualizing Your Data)
You now want to visualize the data using the seaborn library for Python. We’ll be doing this using Jupyter Notebook, if you don’t have that, you can find installation instructions here.
现在,您想使用适用于Python的seaborn库来可视化数据。 我们将使用Jupyter Notebook进行此操作,如果没有,您可以在此处找到安装说明。
You’ll need to have the seaborn, matplotlib, pandas, and numpy libraries installed. If you don’t use the following command.
您需要安装seaborn , matplotlib , pandas和numpy库。 如果您不使用以下命令。
pip install seaborn matplotlib pandas numpy
Now download the create_text_history_chart.ipynb found on in the GitHub repo and save it to your Desktop. Afterwards, open the Jupyter Notebook.
现在下载在GitHub存储库中找到的create_text_history_chart.ipynb,并将其保存到桌面。 然后,打开Jupyter笔记本。
jupyter lab
Open create_text_history_chart.ipynb:
打开create_text_history_chart.ipynb:
Access the full file on GitHub 访问GitHub上的完整文件
With the file open, click to Run > Run All Cells. This will run the entire script and save the image to your Desktop. The image will be your version of the graph below.
打开文件后,单击运行>运行所有单元。 这将运行整个脚本并将图像保存到您的桌面。 该图像将是您在下面的图表的版本。
摘要 (Summary)
There are many interesting things that you can learn by reading this data. I confirmed some hypotheses about my texting habits and learned new things, e.g. I don’t text as much when I’m in a relationship. There is so much information saved in the database that can be used to learn about yourself. Overall this was a great learning experience which taught me about myself and some very useful skills. I recommend trying this out for yourself; with this guide, it shouldn’t take you long to go through the measures.
通过阅读这些数据,您可以学到很多有趣的东西。 我证实了一些关于我发短信习惯的假设并学到了新知识,例如,当我在恋爱中时,我发短信的次数不多。 数据库中保存了太多信息,可用于了解您自己。 总体而言,这是一次很棒的学习经历,使我了解了自己和一些非常有用的技能。 我建议您自己尝试一下; 使用本指南,您无需花很长时间即可完成这些措施。
Several areas to expand upon are non-iMessage communication platforms such as Facebook Messenger, WhatsApp, WeChat, etc. These have all had a profound impact on how we communicate and will contribute to differences in peoples’ graphs. iMessage group chats have also skewed results by showing that I receive many more texts than I send.
非iMessage交流平台(如Facebook Messenger,WhatsApp,微信等)需要扩展几个领域。这些都对我们的交流方式产生了深远的影响,并将导致人们图表的差异。 通过显示我收到的短信多于发送的短信,iMessage群聊也使结果偏斜。
Feel free to reach out to me to discuss this. I’d love to hear how I can improve this, discover better methods, or how else I can read this data.
欢迎与我联系讨论此问题。 我很想听听我如何才能改善这一点,发现更好的方法,或者我还能如何读取这些数据。
Jérémie Allard
杰里米·阿拉德(JérémieAllard)
JeremieAllard.com |LinkedIn |GitHub
JeremieAllard.com | 领英 的GitHub
翻译自: https://medium.com/analytics-vidhya/analyze-your-texting-habits-using-sql-and-python-bd440de5d114