-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to use JavaScript tracking and Log Analytics at the same time, and merge the data / deduplicate to avoid double counting #9665
Comments
It's a great idea and would be an awesome feature indeed. However, technically probably quite difficult. I presume we won't find time to work on this soon as we can maybe provide more value by spending the time on some other features. It would be really cool if someone could but some thoughts on how it could work technically. Eg how can we 100% correctly match a user tracked with JS with some logs from the webserver. Not sure if it's possible, especially when requests are coming from same IP / company. |
thanks for positive answer! Well of course great things are not always easy and could also not very often be fulfilled within first approach ;-) But there are things that could be done relatively easily: e.g. => With this one can already optimize statistics on all fields where url + counting is enough:
probably there are some more |
when doing this kind of statistic quality check "manual"
After that one can look for a given period in the data above on both websites
|
Good point and very true. Interesting approach of using it in 2 sites and comparing. Didn't even think of this initially. It could kinda work like tracking into 2 sites separately, then we check visits / actions against each other and merge them eg into a third site. It's still not super trivial but a simple proof of concept could be maybe made. There are still challenges eg when IP is anonymized it will be probably impossible to know if an individual was already tracked or not. This applies especially to German users. A nice thing is the new kind of reports it would allow. For example we could have a site tracked with JavaScript, but still have bandwidth reports that are usually only available with log importer. We would maybe know how many resources were loaded etc. Still, merging this data won't be easy (eg when dealing with dates/times to find matching user etc there are always problems :) ) |
perfect idea! => regarding the other points you are mentioning: looks like you got hooked on this idea :-) |
For doing a combination like that, it would help very much to keep as many raw data of tracking hmmm the more I think, keeping raw data is not only helpful but essential (and storage is becoming cheaper and faster every day, but visitor count (data production) on websites tracked with piwik is not enhancing with same speed) |
Another use case for "more than just JS tracking" : External File download. If someone link a file on their website, just using Piwik will not be enough since the downloads will not be fired by Piwik at all. I have exactly that request right now to have "more precise" (external) downloads which is only possible thru Apache log files... |
that's a pretty good use case! |
to make this usable (and in general) |
having the possibility to compare js tracking results easily with log import tracking results, |
👍 for the ideea,this is exactly what i did thinked : " Interesting approach of using it in 2 sites and comparing. Didn't even think of this initially. It could kinda work like tracking into 2 sites separately, then we check visits / actions against each other and merge them eg into a third site. It's still not super trivial but a simple proof of concept could be maybe made." 2 sites,one tracked with java,one with server side tracking,and an 3rd website matching the data. On the server side,to have an real picture,right now,are we able to filter GOOD BOTS + BAD BOTS ? If we can filter 👍 GOOD BOOTS 👍 BAD BOTS 👍 Real Humans ,practically we can get an real picture. Most of the good bots of course can be easily identify,because they use good practice,like having the word "bot" in their construction:googlebot,bingbot,adsensemediabot,etc On joomla for example,there is the EORISIS piwik plugin and there is another if i remember very well from yoat or something like that,witch is only for server side tracking. Eorisis piwik can track on joomla with :java,java+image,Server side. I tryed practically to run on the same website,eoris with java and the other with server side,the problem,is that if you enable both plugin,joomla crashes,so it Anyhow this should be done as @hpvd noted here : #9711 And things like "great details" like screen resolution ,plugins used,can be solved,if we implement misc tracker ,like awstats is doing,and i can detail this,as it`s documented and can be done for Piwik as well. Like @hpvd said : " With Piwik's analyses of server log files, all visitors are tracked -always. " this is the only certain thing that you can have control as an website owner,on the server logs. Maybe we can setup this as an milestone for piwik 3. |
@tsteur about : "There are still challenges eg when IP is anonymized it will be probably impossible to know if an individual was already tracked or not. This applies especially to German users." Can you detail this ?Maybe i can help.Give more precise example of what you mean,and about what ip`s are you talking about. @tsteur about : " A nice thing is the new kind of reports it would allow. For example we could have a site tracked with JavaScript, but still have bandwidth reports that are usually only available with log importer. We would maybe know how many resources were loaded etc. Still, merging this data won't be easy (eg when dealing with dates/times to find matching user etc there are always problems :) ) " Why just not having 2 websites so we can compare,if anyone wish that,and maybe implementing misc tracker,as awstats is doing,for getting into user,their resolution,plugins used and so on. That way via server logs ,it won`t be missed nice data tracked with javascript,and users with javascript enabled can be directly trackable into just 1 website. And the real picture of the data,it can be achieved only if we can filter : REAL HUMANS,GOOD BOTS +BAD BOTS (i think the bad bots filtering is more hard) and if we implement what @hpvd said on this topic : #9711 ,piwik will be the only real data stats analytics tool. |
Being able to combine server logs and javascript tracking logs is also one of the first thing I thought about when I saw the log analytics features. I don't really know how Piwik works internally but what seem feasible and really reliable to me would be to use an iterative process to merge javascript tracking logs into server logs.
If we can't find a matching JS log for a server log, we ignore the JS log and add it to a I believe this would prevent the use of 2 sites which I do not find really practical from a user perspective. Here is a basic PHP implementation of what I am thinking of:
Any thought on this? |
Hi @nicolasbadia |
Question/comment from a user in email
Note: it's possible to disable cookies in Matomo tracker. |
Note that this is the exact question that brought me here. Also, I honestly don't know whether cookies are the actual issue. The "idea" of GDPR is to ask consent for processing data and whether you set a cookie or not, you'll still be processing personnel data by injecting the javascript snippet and even by analysing logs. IANAL, but saying "we don't set a cookie and that makes all the problems go away" seems a little simplistic. |
I think you are correct kwisatz GDPR doesn't allow to process apache logs for tracking purpose if consent was not given. ‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction; |
This other technical solution is interesting too: #13023 using a SDK eg. PHP SDK on the server rather than using log files. It has some upsides (not having to use log analytics) and downsides (only works for PHP, will need a SDK implementation for each language, might be hard to send asynchronous https requests without performance impact to the site, ...) |
I think you are correct kwisatz GDPR doesn't allow to process apache logs for tracking purpose if consent was not given. GDPR folks applies as an general thumb to Europe only,GDPR haves nothing to do with your server,with your server location witch can be OFFSHORE,nothing to do with your apache logs ,GDPR is an law act that applies only in EU based countries,if my servers are outside EU,EU does not have no juridisction,you can follow or not follow those rules that is your problem. The main fact that an software offers the option to track analytics data witch is an must for any website,with or without coockies is called Software Option,if you don`t like that option you have the option to not use it,you have the option to not use any software that is not suitable for your use. GDPR was made in order to protect users from viruses,malicious injections via coockies,GDPR haves nothing to do the way you process your data in your servers,that is strictly your problem. You are the master of your analytics,not the user. Should we ask the user : Hey user,do you consent to receive right now 100$ ?!Just click,yes! The users are just users,they are not technically IT specialised work force ,depending on the question you ask,any user could answer YES or No,but for an Webmaster for his analytics,is not important if the answer is YES or NO,it matters to be able to see the real picture of his website. Also all websites haves what is called : TOS ,in your Terms and conditions you can write your own website rules,so if that user wants to use your website,by accepting your TOS,he will accept coockies and all your terms and conditions,else he can go in any other website if he does not like your TOS simple as that.In order to use my website you must respect my TOS,you don`t want to respect my TOS,go somewhere else,simple as that. GDPS,JSLSA,EJAIS,cannot impose my TOS,as GDPR is not paying for servers,technically support,etc is not their business,is your business,else GDPR will pay for all the loses of your business or what?As based on analytics business could take critical decision,because of those decision an company can grow,or an company can go bankrupt. If i have 100.000 users and no users are giving CONSENT OF COOCKIES,and 100 give,i will know that i had 100 visits,not 100.100 visits,witch is something else,as an Web Analytics Webmaster i want my analytics to be clear,not to be fake because of some stupid non technically birocrats that gives all kinds of laws,witch are even more stupid then them. Instead of using Fake analytics and data,you better just not use no analytics at all,you can go BLIND and by GDPR,KHFAL,KJGM or whatever stupidity they might think on next time. If the user give or not give consent,they are not protecting the user,especially in any clear white hat website. Now in an black hat website,witch their main purpose is to infect the users via malware with an virus,do you really think GDPR can really protect the users from the "bad guys"?/?? The only way DUMB users can protect themself,is by IT Education,only Education can protect them,if we give an law tommorow ALL USERS of the internet will be Protected because we say so,do you think creators of viruses,malwares cares about what we write on same paper and will not harm the dumb userss??Of course,they will harm them no matter what law is written or not-written.You can`t just give an law and automatically protect anyone and by giving that law all bad people will become ANGELS from tommorow and everyone will be Happy,unfortunately this is not the way things functions on this world. GDPR cannot protect nothing,is just some rules that you should follow and it was writed in order to not harm the user with viruses,malwares,etc with Web Analytics,you cannot harm no one,you are just collecting data about your Users,your not infecting the people with viruses ,malwares,by tracking their actions. For example people that are using FREE websites that are based on Advertisign ,without advertisign those websites are dead,as lots of users are using : AD BLOCK ,Ublock,all kinds of blockers,the Webmasters implemented solution to discover the users that are using Ad block,and as an user you must UN LOCK the website,so you will see Ads,else you can go wherever you go,but you cannot access my servers,my website,my resources,etc. It`s the final choice of the user if he wants to enter my house,he needs to respect my rules,if he does not want to respect them,no problem you will not enter my house,very simple. |
Javascript unique Identifier : IP Server side unique Identifier : IP Server logs would be our reference as we are 100% sure they are correct. Then we would try to find a matching JS log with it. Ip Unique Identifier - for merging accurate data. Server log 100 % reference are correct everytime,javascript log in the same unique identifier IP . For java not found as nicolasbadia said: "If we can't find a matching JS log for a server log, we ignore the JS log and add it to a no_matching_server_log.log file (which we might use to improve our process)." But i don`t think there will be such case,because on server side everything is tracked 100% ,and we just need to put javascript reports basicaly merged in same user report,but the main identification of both world is the IP Adress. |
This is just plain false. GDPR protects users from aggregation and creation of unwanted online-profiles. |
Is not false,that was their initially intention. Gdpr does not protect users from aggregation creation or even selling those user data,because the user haves to accept Tos of the website,user cannot do nothing except leaving that website if he does not accept the Tos of the websites. Users are not owners of websites,they don t even need gdpr if they don't like the Tos of any website,is simple Exit. But if the user register into that website and give consent that he accepted Tos of that website,gdpr will not protect that user,if that user make request that his info to be deleted from that website,the webmaster will just delete that user,and that user won't have anymore access to the resources of that website,is very simple. Is an "false protection" ,is like someone will give you right now an writed law on paper that they will protect users against Coronavirus, unfortunately they cannot do nothing,and they can't throw Coronavirus in jail,because Coronavirus does not know no law. You have to protect yourself by education,not by relying on some birocratics pieace of paper,they won't protect anyone,neither the users neither the webmasters. Anyhow i think Gdpr is out of topic,because the topic is about :Allow to use JavaScript tracking and Log Analytics at the same time, and merge the data / deduplicate to avoid double counting,not about GDPR or KClm or whatever they will invent in future. |
Nope – GDPR states that your service has to be usable regardless of the user's agreement for having their behavior tracked.
Nope – the "webmaster", as you call them, is to delete the data you are asking to get deleted. Exceptions are data that they are obliged to keep for legal reasons (e.g. for their tax declaration).
Implementing GDPR, those who do not adhere to the "written[n] law on paper" can be and are being fined. Fines are a lot higher than what you are going to want to pay. |
Nope – GDPR states that your service has to be usable regardless of the user's agreement for having their behavior tracked. Your in error,gdpr is not the owner of the website and server,if i don t want you in my club because you don t dress like my tos is saying : white shirt,your out of the club. Yes the data is requested by user would be deleted,however as the user did submit that data voluntary and not forced by anyone,is the user responsibility what they post or share or request. Gdpr is not owner of any website.And apply only in Eu. Gdpr haves nothing to do with the features witch are on/off from an software . Their is no fine if your server is in Russia or Sudan or Belize,they have no jurisdiction over there. Like i said gdpr is just an eu directive,they cannot impose an eu directive outside their jurisdiction. If you don't like the Tos of an website,they have no obligation to make that site available or that resources available to you,simple as that,they can even ban your ip and your gone if your trouble maker to that website owner. |
Chiming in into the GDPR discussion. GDPR says that for profiling a user activity you need to have their permission. That means he needs to agree even before you set any cookies. But it allows processing of statistical data on the ground of justified interest of the website provider. You can do statistical data without needing to ask for a consent. But even then IP addresses need to be stored anonymized. I believe though it would be justifiable to match the log file IP with the Javascript tracked IP and anonymize them after the fact for permanent storage. |
Sometimes more than one data source is available for description/documentation of the same activity.
In most cases the data source have different strength and also weakness.
But combining them, the image of reality is always better than only using one source.
To give an example:
there is a place with two different cameras looking at it from two different directions.
One of the camera is a HD Color camera, mounted in a height of 10m and the other one is an black&white model, with lower resolution, mounted in a height of 2m, but it can make pictures also in the dark.
Both on their own can't document everything happening all day long on the place in perfect quality.
But together they doesn't miss anything.
The same situation exists when trying to track activities using Piwik:
In the future when Piwik will become a "universal activity tracker" with v3
but also today when tracking "only" websites.
With Piwik's java script tracking you can track many many details.
But there are things that may block Piwik's js: browser settings, browser add ons etc.
In this case these visits are not tracked. And what is even more worse from statistics pov:
one do not only not now what these visitors have done, but one do not know how many visits were missed.
With this some numbers in statistics like number of total visitors are bably broken.
This may have effects on other things like e.g.Conversion rate not only in ecommerce (numbers of vistors/reached goals), impression counting when doing advertisments, etc.
With Piwik's analyses of server log files, all visitors are tracked -always.
But not with that great details js tracking can do.
=> So why not making it possible to use data from different source and combine
the best of both worlds to build a perfect image of reality?
When starting structural work on the core of Piwik for v3.0, it is a perfect point to think of these possibilities.
The text was updated successfully, but these errors were encountered: