import_logs.py doesn't support Unicode characters with json format · Issue #16 · matomo-org/matomo-log-analytics · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import_logs.py doesn't support Unicode characters with json format #16

Closed
estemendoza opened this issue Apr 17, 2014 · 2 comments
Closed
Labels

Comments

@estemendoza
Copy link

I'm working with iOS apps and sometimes their names have special characters and when this apps make requests to my server, the import_log.py script fails to parse this lines. This only happens when using nginx_json format.

Here's an example:

{"ip": "1.1.1.1","host": "blabla.com","path": "/api/test.xml","status": "200","referrer": "-","user_agent": "F\xFAtbol/1.0 (iPhone; iOS 7.1; Scale/2.00)","length": 267,"generation_time_milli": 0.009,"date": "2014-04-16T14:56:24-03:00"}

Check that user_agent has a special character and that the script fails when parsing this lines.

Thanks

Migrated from matomo-org/matomo#5013

@mattab mattab added this to the Short term milestone Apr 10, 2015
@mattab mattab added the bug label Oct 30, 2015
@mattab mattab modified the milestones: Current sprint, Short term Oct 30, 2015
@diosmosis
Copy link
Member

This is a bug in nginx, \xFA is not valid JSON nor is it valid UTF-8. It is actually latin-1 for ú (so the user agent is Fútbol...). Unfortunately, treating individual JSON strings within serialized JSON as a separate encoding seems to be non-trivial, so I have yet to find a workaround I can apply to the log importer. (I tried using python-cjson which can handle hex escapes, but it has other differences w/ the normal json module.)

@diosmosis
Copy link
Member

Found a workaround: #120 will merge momentarily.

diosmosis added a commit that referenced this issue Dec 28, 2015
Fixes #16, convert hex escapes in malformed JSON to unicode escapes to workaround nginx bug.
@innocraft-automation innocraft-automation removed this from the Current sprint milestone Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants