April 2020 work in progress - Quarantine edition n. II

On data retention

Actions performed

On facebook.tracking.exposed we purge cyclically data. The data is backup and not accessible as historical reference. This because might be interesting, in the future, see how Facebook HTML and timelines looked like back in time. The data collected from our infrastructure is divided in two group:

  1. information submitted by browser extension
  2. information generated by us, based on the data above, still inheriting ownership.

The data on point 1 are subject to our data retention policy, our primary goal on this is don’t fill the hard drive because that is our bottleneck. Data on point 2 are subject to your (if you are one of our adopters) data retention policy, the data there are only useful to you, and as long as you let them available on Tracking Exposed, they might contribute to statistics and aggregated data.

In the online server, everything received before 2019-01-01 got wiped.

Other platform (such as youtube), we differently organize data retention policy (still work in progress):

switched to db facebook
PRIMARY> db.timelines2.count({ "startTime" :{  "$lt": new Date("2018-01-01") }} )
0
PRIMARY> db.timelines2.count({ "startTime" :{  "$lt": new Date("2019-01-01") }} )
590177
PRIMARY> db.timelines2.remove({ "startTime" :{  "$lt": new Date("2019-01-01") }} )
WriteResult({ "nRemoved" : 590177 })

With the command above we deleted the timelines, the object linking all the impressions to an user and to a unique sequence.

PRIMARY> db.impressions2.remove({ "impressionTime": { "$lt": new Date("2019-01-01") }} )
WriteResult({ "nRemoved" : 7619540 })

With the above we deleted impressions, an object reporting if the impression is public or private, and if there is an HTML (public posts). Private post have an impression number, and this guarantee a sequence. impressionOrder come from 1 to the length of the timelines.

PRIMARY> db.htmls2.remove({ "savingTime" : { "$lt" : new Date("2019-01-01") } })
WriteResult({ "nRemoved" : 4586490 })

Remove the HTML collection.

It should be documented how these information are the one provided by people, subject to cyclic cleaning.


Youtube Tracking Exposed progresses


Italian below

C’è stata la puntata di LOST del 19 Aprile 2020, con interessanti spunti al minuto 50. La puntata completa e più opzioni la si trova qui al minuto 59:15 inizia Giulia x Trex!, qui l’mp3