Soft Integration of Geo-Tagged Data Sets in J-CO-QL+
Abstract
:1. Introduction
2. Related Work
2.1. Soft Querying on Databases
2.2. Integrating Data Sets Describing Public Places
3. Basic Notions on Fuzzy Sets
- If we are looking for “popularandcheap restaurants”, we could formulate the search as “AND” (in terms of fuzzy sets, it is ). Clearly, the lower membership degree determines the actual relevance of a place p.
- If we are looking for “popularorcheap restaurants”, we could formulate the search as “OR” (in terms of fuzzy sets, it is ). Clearly, the higher membership degree determines the actual relevance of a place p (a place could be not popular, but highly cheap).
- If we are looking for “popularand possiblycheap restaurants”, we could formulate the search as “70% AND30% ” (in terms of fuzzy sets, it is ). Clearly, the final membership degree is dominated by the degree of popularity, but a popular place that is also a cheap restaurant has a higher membership degree than a popular place that is not at all a cheap restaurant.
4. The J-CO Framework
- The J-CO-QL Engine actually processes J-CO-QL queries. It obtains data to process from JSON document stores and from web sources; it is able to store results again into JSON document stores.
- J-CO-DS is a simplified JSON document store [37]: it is designed to provide users with the capability of storing large single JSON documents (usually, popular JSON document stores are not able to deal with very-large single JSON documents). J-CO-DS does not provide any internal computational capability, i.e., it does not provide a query language: in fact, it is part of the J-CO Framework, in which the component that provides computational capabilities is the J-CO-QL Engine.
- J-CO-UI is the user interface of the framework. It provides users with a graphical interface to interactively write J-CO-QL queries in a step-by-step way; users can also inspect intermediate results.
4.1. The Query Language
4.1.1. Data Model
- The basic item to process is a JSON document. A document is represented within a pair of braces “{” and “}”; it is a sequence of fields separated by commas.A field is a “name: value” pair, where “name” is the field name, while “value” is the value of the field; the name is always enclosed within double quotes (e.g., "name"); the value can be a number, a string (enclosed either within double quotes or single quotes), a Boolean value, a nested sub-document (enclosed within a pair of braces “{” and “}”) or an array (enclosed within square brackets “[” and “]”, whose items can be any kind of JSON value, separated by commas).
- J-CO-QL gives a special meaning to root-level fields whose name becomes with “~”; these names are compliant with JSON naming rules, but J-CO-QL considers some of them in a special way, as illustrated hereafter.
- -
- The root-level ~fuzzysets field is used to represent membership degrees of a d document to fuzzy sets. It works as a “key-value” map: given a field within ~fuzzysets, the field name is the name of the fuzzy set to which the membership degree has been evaluated; the value is a real number in the range , which denotes the membership degree. This way, given a d document, it is possible to represent its membership to many fuzzy sets.
- -
- The root-level ~geometry field represents geometries (also called “geo-tagging”) of spatial entities represented as JSON documents. In this paper, we do not make use of geometries (the interested reader can refer to [5]).
- A “collection” is an unordered multi-set of heterogeneous documents, i.e., it can contain multiple copies of the same document.
4.1.2. Execution Model
- A “query” is a sequence of instructions , with . A query is a “pipe of instructions”.
- Each instruction receives an input “query-process state” and generates a new query-process state .
- A “query-process state” (with ) is a tuple .
- -
- is called “temporary collection”, since it is a collection of JSON documents that passes through the pipe of instructions, that contains temporary results of the query process.
- -
- is the “Intermediate-Results databaase”, i.e., a database that is exclusive for the query process, to store intermediate results to be used later.
- -
- is the set of “database descriptors”, used to handle connections with external JSON document stores.
- -
- is the set of “Fuzzy Operators” defined within the query; they allows for evaluating membership degrees to fuzzy sets (see Section 6.2).
- -
- is the set of user-defined “JavaScript Functions”; they are defined throughout the query to complete computational capabilities of the query language (see [38]).
- The initial query-process state is . Each instruction possibly modifies one member of the query-process state.
5. Problem and Methodology
5.1. Premises and Problem
5.2. Fuzzy Relation for Matching Public Places
5.2.1. Basic Functions and Relations
5.2.2. The SameLocation Relation
- Case A: missing address(es). If is missing, or is missing or both, but the two pairs of coordinates are available, only these latter ones can be used to evaluate the relation.
- Case B: missing coordinate(s). When one or more coordinates in and are missing, but both addresses and are available, only these latter ones can be used to evaluate the relation.
- Case C: addresses and coordinates are all available. When in and all geographical fields (i.e., address and coordinates) are available, they all contribute to the evaluation of the relation.
5.2.3. Global MatchingPlaces Relation
6. Presenting the Script
6.1. Data Set
6.2. Defining Fuzzy Operators
6.2.1. The Close Fuzzy Operator
- The PARAMETERS clause defines the formal parameters of the operator. Specifically, only the distance parameter is defined.
- The PRECONDITION clause defines a condition on the parameters: if the condition is not satisfied, the evaluation of the fuzzy operator stops and an error signal is raised. Specifically, the precondition says that the distance must be no less than 0.
- The EVALUATE clause specifies a mathematical expression on the parameters, whose value is used as x-axis coordinate against the membership function defined by the subsequent POLYLINE clause. In the Close fuzzy operator, the expression simply takes the value of the distance parameter.
- The POLYLINE clause specifies the membership function actually used to compute the membership value. The function is defined as a polyline, by a sequence of pairs , where can be any real value, while ; given two consecutive points and , it must be . Each pair of consecutive points defines a segment. Given an x value, if it is between and (in the case of n points), the corresponding y value is considered as a membership degree; if , the membership degree is ; if , the membership degree is .
6.2.2. The Similar Fuzzy Operator
- The operator receives two parameters, called st1 and st2; they are the two strings to compare.
- No precondition is specified: in the case of empty or null strings, the operator returns 0 as membership degree, because the similarity degree is 0.
- The EVALUATE clause calls the built-in (i.e., provided by J-CO-QL) function named JARO_WINKLER_SIMILARITY, to obtain the similarity degree of the two strings. The similarity degree is a value in the range . In the case of null or zero-length strings, the returned similarity degree is 0.
- The POLYLINE clause defines the membership function depicted in Figure 4b. Notice that it penalizes similarity degrees that are less than , while membership degrees that are greater than are rewarded: this is due to the sometimes bizarre behavior of the Jaro-Winkler similarity, that returns high similarity degrees even when strings only shares some characters, but are not actually similar; furthermore, for strings such as “The Gray Horse” and “GrayHorse”, the similarity degree is around , although they clearly have to be considered very similar. With this shape, we try to compensate the behavior of the Jaro-Winkler similarity, so as to deal with raw addresses and names (i.e., not cleaned from articles, numbers, punctuation, and so on).
6.2.3. The WeightedAggregationBeta Fuzzy Operator
- The operator receives three parameters: f1 and f2 are the two values in the range to aggregate, while beta is the aggregation weight (in the range too) of f1 with respect to f2.
- The PRECONDITION clause ensures that the actual values of the three parameters are in the range (notice the IN_RANGE predicate).
- The EVALUATE clause actually performs the weighted aggregation.
- The POLYLINE clause defines a very simple membership function, which is reported in Figure 4c: it is a straight segment from the point to the point ; this way, the value computed by the EVALUATE clause is returned, as it is, as membership degree.
6.3. Retrieving and Pairing Descriptors
- The instruction retrieves the FacebookDescriptors collection from the ijgiDb database and aliases it as f; similarly, it retrieves the GoogleDescriptors collection from the same database and aliases it as g.For each f document from the f collection and for each g document from the g collection, a new d document is created. This document contains two fields: the first one is called f and its value is the source f document; the second one is called g and its value is the source g document. The d document is further processed by the subsequent CASE clause.
- The CASE clause evaluates a pool of selection conditions expressed within a WHERE clause; if a d document is selected by a condition, it is processed according to the subsequent sub-clauses. Many WHERE branches are possible: a d document is processed by the branch associated with the first WHERE condition that it satisfies; if no condition is satisfied, d is discarded (it will not appear in the output temporary collection).Specifically, the CASE clause in the instruction on line 5 in Listing 2 contains three WHERE branches: each of them deals with one of the three situations considered for defining the relation by Definitions 5–7. Hereafter, we separately discuss the behavior of the three branches.
- -
- The first WHERE branch deals with the case A of the fuzzy relation, defined in Definition 5. The condition is true if either the value for the fbStreet field is missing or the value for the gAddress field is missing or both are missing, and all coordinates are available. If a d document meets the condition, the GENERATE block further processes d through the CHECK FOR clause, whose goal is to evaluate the membership degrees of d to fuzzy sets.Specifically, two FUZZY SET branches are present: the former evaluates the ClosePlaces fuzzy set, the latter evaluates the SameLocation fuzzy set.The membership degree to the ClosePlaces fuzzy set is obtained by the associated USING clause: this is a “soft condition”, in which fuzzy operators (such as those defined in Section 6.2) and fuzzy-set names can be composed by the usual (fuzzy) logical operators AND, OR and NOT; the resulting membership degree is the membership degree to the evaluated fuzzy set. If this is the first membership degree evaluated for d, then d does not have the special ~fuzzysets field: in this case, the field is added and within it only one single field is present, having the same name of the evaluated fuzzy set, whose value is the computed membership degree. In contrast, if the ~fuzzysets field is already present, it is extended with one extra internal field, describing the membership degree to the new evaluated fuzzy set.Specifically, the first branch evaluates the membership degree to the ClosePlaces fuzzy set, by means of the Close fuzzy operator (see Listing 1), which is called passing the geodesic distance computed by the GEODESIC_DISTANCE built-in function.The second FUZZY SET branch evaluates the membership degree to the SameLocation fuzzy set, by assuming that it coincides with the ClosePlaces fuzzy set (see Definition 5). Finally, the ALPHACUT clause discards the d document from the output temporary collection if its membership degree to the SameLocation fuzzy set is less than ; remember that this is the threshold mentioned within Definition 8. Figure 6a reports a sample document generated by the first WHERE branch; notice the presence of the ~fuzzysets field and its inner fields.
- -
- The second WHERE branch deals with case B of the relation (see Definition 6), i.e., at least one coordinate is null but both addresses are present.In this case (see Definition 6), the membership degree to the SimilarAddress fuzzy set is evaluated by means of the Similar fuzzy operator, which evaluates the fuzzy similarity relation between two strings (in this case, the two addresses).Then, as defined by Definition 6, the second FUZZY SET branch tells that the SameLocation fuzzy set coincides with the SimilarAddress fuzzy set. Again, the ALPHACUT clause puts the d document into the output temporary collection if the membership degree to the SameLocation fuzzy set is no less than (the threshold in Definition 8). Figure 6b shows a sample document generated by the second WHERE branch.
- -
- The third WHERE branch deals with case C of the fuzzy relation (see Definition 7), i.e., both all addresses and all coordinates are available. Consequently, the membership degrees to three different fuzzy sets are evaluated: the first one is the SimilarAddress fuzzy set, by means of the Similar fuzzy operator applied to addresses; the second one is the ClosePlaces fuzzy set, by means of the Close fuzzy operator applied to the geodesic distance between the two points.The third FUZZY SET branch evaluates the membership degree to the fuzzy set named SameLocation: according to Definition 7, it is obtained by calling the WeightedAggregationBeta fuzzy operator, whose goal is to perform the weighted aggregation: it receives the two values (in the range ) to aggregate and the weight.The USING soft condition calls the WeightedAggregationBeta fuzzy operator, passing the membership values to the SimilarAddress fuzzy set and to the ClosePlaces fuzzy set, which are obtained by means of the MEMBERSHIP_OF built-in function (that extracts the membership degree from within the ~fuzzysets field). The third parameter is the constant value : thisis the weight presented and discussed in Definition 7. The ALPHACUT clause discards the evaluated document if its membership degree to the SameLocation fuzzy set is less than (the threshold mentioned in Definition 8).Figure 6c reports a sample document generated by the third branch; notice that the ~fuzzysets field has three inner fields.
6.4. Relevant Pairs
- The FILTER statement takes the temporary collection as input and generates a new temporary collection by applying a CASE clause. The behavior of this clause is the same as in the JOIN OF COLLECTIONS statement.
- On line 6, only one WHERE branch is present: if a document does not meet the selection condition, it is discarded from the output temporary collection.Specifically, the selection condition selects those documents having both the names in the two paired descriptors, so as to evaluate the membership degree to the SimilarName fuzzy set.
- The first FUZZY SET branch in the CHECK FOR clause evaluates the membership degree to the SimilarName fuzzy set; again, in the USING soft condition, the Similar fuzzy operator (see Listing 1) is called, this time passing names (instead of addresses).
- The second FUZZY SET branch can finally evaluate the membership degree to the MatchingPlaces fuzzy set, corresponding to the fuzzy relation defined by Definition 8. Remember that the fuzzy relations named and are aggregated by means of the weighted aggregation operator. In Listing 1, we defined the WeightedAggregationBeta fuzzy operator, which here is used to aggregate the SimilarName fuzzy set and the SameLocation fuzzy set; the SimilarName fuzzy set weights for the of the final membership degree (this is the weight mentioned in Definition 8), so that similarity between names moderately prevails over geographical similarity (whose goal is to cofirm that two places having similar or identical names are actually the same place). The resulting membership degree becomes the membership degree to the MatchingPlaces fuzzy set.
- At this point, only relevant pairs must be kept, i.e., those pairs whose membership degree to the MatchingPlaces fuzzy set is no less than . The ALPHACUT clause does that.
- The final BUILD section (which is optional, this is why it was not present in the JOIN OF COLLECTIONS instruction on line 5 in Listing 2) restructures all survived documents.Specifically, a novel rank field is added, whose value is the membership degree to the MatchingPlaces fuzzy set. This field is necessary, because the subsequent DEFUZZIFY option discards the ~fuzzysets field (as a consequence, documents are “defuzzified”).
6.5. Choosing the Best Pairs
- The GET COLLECTION instruction on line 8 again obtains the RelevantPairs collection from the database, again making it the temporary collection.
- The GROUP instruction on line 9 groups documents in the temporary collection, on the basis of the gId field, which is the identifier of Google Places descriptors. For each group, a novel document is generated into the output collection, such that it has the gId field and an array called gGroup, in which all grouped documents are reported. This array is sorted in reverse order of value of the rank field within grouped documents. Figure 9a reports an example of the grouped document.
- The EXPAND instruction on line 10 unnests again all grouped documents. For each output document, the gPair field is added to the global ones (apart from the expanded array); this new field contains two inner fields: the item field contains the unnested document; the position field denotes the position occupied by the unnested item in the gGroup array.As a result, the temporary collection generated by line 10 contains as many documents in the RankedPairs collection, but now they are tagged with the relative order for Google Places descriptors on the basis of the rank field. Figure 9b reports an example of an unnested document.
- The FILTER instruction on line 11 actually selects only documents that previously occupied the first position in their group (based on the reverse order of rank, they are the ones with the highest rank). The BUILD section builds the same structure again, as in the RelevantPairs collection. Figure 9c reports an example of resulting document.
- Finally (on line 12) the last temporary collection is saved into the ijgiDb database with name SamePlaces, which is the desired output of the process.
7. Experimental Evaluation
7.1. Effectiveness
7.2. About Execution Times
8. Conclusions and Future Work
8.1. Conclusions
- The effectiveness that the script obtains is very interesting: with the value of the threshold set to , it is possible to obtain the best balance between precision and recall, which are slightly less than .
- Execution times are good too: less than 4 min to perform the soft integration is absolutely acceptable (the reader can notice that writing the full script from scratch takes longer).
- The complex membership functions that it is possible to specify in fuzzy operators (see Section 6.2) were exploited to deal with bizarre behavior of the Jaro-Winkler string-similarity metric. In fact, it returns high similarity values for strings that appears to be very different (in the sense that they do not denote similar names or similar addresses). However, we experienced also the opposite behavior, i.e., two strings that were actually very similar obtained not-so-high similarity degree. The complex membership function that we defined for the Similar fuzzy operator allowed us to compensate this behavior.
8.2. Future Work
- First of all, we are going to complete the extension of all J-CO-QL statements with support for fuzzy concepts. In particular, we are going to address the problem of defining “soft aggregators” that can be applied on arrays of JSON documents; we will adopt the same approach followed for defining fuzzy operators.
- Many challenges concerning integration and processing of geo-tagged data sets are arising. For example, the GeoJSON format [49] represents a geographical information layer as a unique, giant, JSON document. We conceived the idea of defining a domain-specific language for querying features within GeoJSON documents [50], which is translated into J-CO-QL scripts. We plan to further explore this idea, by identifying other application domains and defining novel domain-specific languages to translate into J-CO-QL. Indeed, the idea of devising the J-CO Framework came out while working on an international project [51,52], in which Big Data concerning mobility had to be collected and processed.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bray, T. The Javascript Object Notation (JSON) Data Interchange Format. 2014. Available online: https://www.rfc-editor.org/rfc/rfc7159.txt (accessed on 1 September 2022).
- Bordogna, G.; Capelli, S.; Psaila, G. A big geo data query framework to correlate open data with social network geotagged posts. In Proceedings of the Annual International Conference on Geographic Information Science, Wageningen, The Netherlands, 10–11 May 2017; pp. 185–203. [Google Scholar]
- Bordogna, G.; Ciriello, D.E.; Psaila, G. A flexible framework to cross-analyze heterogeneous multi-source geo-referenced information: The J-CO-QL proposal and its implementation. In Proceedings of the International Conference on Web Intelligence, Leipzig, Germany, 23–26 June 2017; pp. 499–508. [Google Scholar]
- Bordogna, G.; Capelli, S.; Ciriello, D.E.; Psaila, G. A cross-analysis framework for multi-source volunteered, crowdsourced, and authoritative geographic information: The case study of volunteered personal traces analysis against transport network data. Geo-Spat. Inf. Sci. 2018, 21, 257–271. [Google Scholar] [CrossRef]
- Psaila, G.; Fosci, P. J-CO: A Platform-Independent Framework for Managing Geo-Referenced JSON Data Sets. Electronics 2021, 10, 621. [Google Scholar] [CrossRef]
- Psaila, G.; Toccu, M. A Fuzzy Technique for On-Line Aggregation of POIs from Social Media: Definition and Comparison with Off-Line Random-Forest Classifiers. Information 2019, 10, 388. [Google Scholar] [CrossRef]
- Fosci, P.; Psaila, G. Towards flexible retrieval, integration and analysis of json data sets through fuzzy sets: A case study. Information 2021, 12, 258. [Google Scholar] [CrossRef]
- Fosci, P.; Psaila, G. J-CO, a Framework for Fuzzy Querying Collections of JSON Documents. In Proceedings of the International Conference on Flexible Query Answering Systems, Bratislava, Slovakia, 19–24 September 2021; Springer: Cham, Switzerland; pp. 142–153. [Google Scholar]
- Psaila, G.; Marrara, S. A First Step Towards a Fuzzy Framework for Analyzing Collections of JSON Documents. In Proceedings of the IADIS AC 2019, Cagliari, Italy, 7–9 November 2019; pp. 19–28. [Google Scholar]
- Blair, D.C. Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths; 1979: 208 pp. Price: $32.50. J. Am. Soc. Inf. Sci. 1979, 30, 374–375. [Google Scholar] [CrossRef]
- Bosc, P.; Pivert, O. SQLf: A relational database language for fuzzy querying. IEEE Trans. Fuzzy Syst. 1995, 3, 4895977. [Google Scholar] [CrossRef]
- Bosc, P.; Pivert, O. SQLf query functionality on top of a regular relational database management system. In Knowledge Management in Fuzzy Databases; Springer: Berlin/Heidelberg, Germany, 2000; pp. 171–190. [Google Scholar]
- Galindo, J.; Medina, J.M.; Pons, O.; Cubero, J.C. A server for fuzzy SQL queries. In Proceedings of the International Conference on Flexible Query Answering Systems, Roskilde, Denmark, 13–15 May 1998; pp. 164–174. [Google Scholar]
- Zadrozny, S.; Kacprzyk, J. Fquery for access: Towards human consistent querying user interface. In Proceedings of the 1996 ACM Symposium on Applied Computing, Philadelphia, PA, USA, 17–19 February 1996; pp. 532–536. [Google Scholar]
- Kacprzyk, J.; Zadrożny, S. FQUERY for Access: Fuzzy querying for a Windows-based DBMS. In Fuzziness in Database Management Systems; Springer: Berlin/Heidelberg, Germany, 1995; pp. 415–433. [Google Scholar]
- Bordogna, G.; Psaila, G. Modeling soft conditions with unequal importance in fuzzy databases based on the vector p-norm. In Proceedings of the IPMU COnference, Malaga, Spain, 22–27 June 2008. [Google Scholar]
- Bordogna, G.; Psaila, G. Customizable flexible querying in classical relational databases. In Handbook of Research on Fuzzy Information Processing in Databases; IGI Global: Hershey, PA, USA, 2008; pp. 191–217. [Google Scholar]
- Bordogna, G.; Psaila, G. Soft Aggregation in Flexible Databases Querying based on the Vector p-norm. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2009, 17, 25–40. [Google Scholar] [CrossRef]
- Kacprzyk, J.; Zadrozny, S. SQLf and FQUERY for Access. In Proceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569), Vancouver, BC, Canada, 25–28 July 2001; Volume 4, pp. 2464–2469. [Google Scholar]
- Urrutia, A.; Tineo, L.; Gonzalez, C. FSQL and SQLf: Towards a standard in fuzzy databases. In Handbook of Research on Fuzzy Information Processing in Databases; IGI Global: Hershey, PA, USA, 2008; pp. 270–298. [Google Scholar]
- Galindo, J. Handbook of Research on Fuzzy Information Processing in Databases; IGI Global: Hershey, PA, USA, 2008. [Google Scholar]
- Han, J.; Haihong, E.; Le, G.; Du, J. Survey on NoSQL database. In Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications, Port Elizabeth, South Africa, 26–28 October 2011; pp. 363–366. [Google Scholar]
- Chodorow, K. MongoDB: The Definitive Guide: Powerful and Scalable Data Storage; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013. [Google Scholar]
- Anderson, J.C.; Lehnardt, J.; Slater, N. CouchDB: The Definitive Guide: Time to Relax; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2010. [Google Scholar]
- Garcia Bringas, P.; Pastor, I.; Psaila, G. Can BlockChain technology provide information systems with trusted database? The case of HyperLedger Fabric. In Proceedings of the International Conference on Flexible Query Answering Systems, Amantea, Italy, 2–5 July 2019; Springer: Cham, Switzerland; pp. 265–277. [Google Scholar]
- Abir, B.K.; Amel, G.T. Towards fuzzy querying of NoSQL document-oriented databases. In Proceedings of the DBKDA 2015: The Seventh International Conference on Advances in Databases, Knowledge, and Data Applications, Rome, Italy, 24–29 May 2015; p. 163. [Google Scholar]
- Almendros-Jimenez, J.M.; Becerra-Teron, A.; Moreno, G. Fuzzy queries of social networks with FSA-SPARQL. Expert Syst. Appl. 2018, 113, 128–146. [Google Scholar] [CrossRef]
- Manola, F.; Miller, E.; McBride, B. RDF Primer. W3C Recommendation (2004). Available online: http://www.w3.org/TR/rdf-primer (accessed on 1 September 2022).
- Cheng, J.; Ma, Z.M.; Yan, L. f-SPARQL: A flexible extension of SPARQL. In Proceedings of the International Conference on Database and Expert Systems Applications, Bilbao, Spain, 30 August–3 September 2010; Springer: Berlin/Heidelberg, Germany; pp. 487–494. [Google Scholar]
- Pérez, J.; Arenas, M.; Gutierrez, C. Semantics and complexity of SPARQL. ACM Trans. Database Syst. (TODS) 2009, 34, 16. [Google Scholar] [CrossRef]
- Kilinc, D. An Accurate Toponym-Matching Measure Based On Approximate String Matching. J. Inf. Sci. 2016, 42, 138–149. [Google Scholar] [CrossRef]
- Santos, R.; Murrieta-Flores, P.; Martins, B. Learning to combine multiple string similarity metrics for effective toponym matching. Int. J. Digit. Earth 2018, 11, 913–938. [Google Scholar] [CrossRef]
- Rui, S.; Patricia, M.F.; Pavel, C.; Bruno, M. Toponym matching through deep neural networks. Int. J. Geogr. Inf. 2018, 32, 324–348. [Google Scholar]
- Li, L.; Xing, X.; Xia, H.; Huang, X. Entropy-Weighted Instance Matching between Different Sourcing Points of Interest. Entropy 2016, 18, 45. [Google Scholar] [CrossRef]
- Yu, L.; Qiu, P.; Liu, X.; Lu, F.; Wan, B. A Holistic Approach to Aligning Geospatial Data with Multidimensional Similarity Measuring. Int. J. Digit. Earth 2018, 11, 845–862. [Google Scholar] [CrossRef]
- Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—I. Inf. Sci. 1975, 8, 199–249. [Google Scholar] [CrossRef]
- Psaila, G.; Fosci, P. Toward an Anayist-Oriented Polystore Framework for Processing JSON Geo-Data. In Proceedings of the International Conferences on WWW/Internet, ICWI 2018 and Applied Computing 2018, Budapest, Hungary, 21–23 October 2018; IADIS (International Association for Development of the Information Society): Budapest, Hungary, 2018; pp. 213–222. [Google Scholar]
- Fosci, P.; Psaila, G. Powering Soft Querying in J-CO-QL with JavaScript Functions. In Proceedings of the International Workshop on Soft Computing Models in Industrial and Environmental Applications, Bilbao, Spain, 22–24 September 2021; Springer: Cham, Switzerland; pp. 207–221. [Google Scholar]
- Solomon, J.; Rustamov, R.; Guibas, L.; Butscher, A. Earth mover’s distances on discrete surfaces. ACM Trans. Graph. (ToG) 2014, 33, 67. [Google Scholar] [CrossRef]
- Jaro, M.A. UNIMATCH, a Record Linkage System: Users Manual; Bureau of the Census: Washington, DC, USA, 1980. [Google Scholar]
- Jaro, M.A. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 1989, 84, 414–420. [Google Scholar] [CrossRef]
- Winkler, W.E. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In Proceedings of the Section on Survey Research Methods; American Statistical Association: Boston, MA, USA, 1990; pp. 354–359. [Google Scholar]
- Winkler, W.E. The State of Record Linkage and Current Research Problems; Statistical Research Division, U.S. Bureau of the Census: Washington, DC, USA, 1999.
- Atanassov, K. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 20, 187–196. [Google Scholar] [CrossRef]
- De, S.K.; Biswas, R.; Roy, A.R. An application of intuitionistic fuzzy sets in medical diagnosis. Fuzzy Sets Syst. 2001, 117, 209–213. [Google Scholar] [CrossRef]
- Karnik, N.N.; Mendel, J.M. Operations on type-2 fuzzy sets. Fuzzy Sets Syst. 2001, 122, 327–348. [Google Scholar] [CrossRef]
- Mendel, J.M. Type-2 fuzzy sets and systems: An overview. IEEE Comput. Intell. Mag. 2007, 2, 20–29. [Google Scholar] [CrossRef]
- Mendel, J.M.; John, R.B. Type-2 fuzzy sets made simple. IEEE Trans. Fuzzy Syst. 2002, 10, 117–127. [Google Scholar] [CrossRef]
- Butler, H.; Daly, M.; Doyle, A.; Gillies, S.; Hagen, S.; Schaub, T. The GeoJSON Format; Internet Engineering Task Force (IETF): Fremont, CA, USA, 2016. [Google Scholar]
- Fosci, P.; Marrara, S.; Psaila, G. Soft Querying GeoJSON Documents within the J-CO Framework. In Proceedings of the 16th International Conference on Web Information Systems and Technologies (WEBIST 2020), On-line, 3–5 November 2020; SciTePress—Science and Technology Publications, Lda.: Setubal, Portugal, 2020; pp. 253–265. [Google Scholar]
- Burini, F.; Cortesi, N.; Gotti, K.; Psaila, G. The Urban Nexus Approach for Analyzing Mobility in the Smart City: Towards the Identification of City Users Networking. Mob. Inf. Syst. 2018, 2018, 6294872. [Google Scholar] [CrossRef]
- Bordogna, G.; Cuzzocrea, A.; Frigerio, L.; Psaila, G.; Toccu, M. An interoperable open data framework for discovering popular tours based on geo-tagged tweets. Int. J. Intell. Inf. Database Syst. 2017, 10, 246–268. [Google Scholar] [CrossRef]
Alpha Cut | Relevant Pairs | Same Places | TP | TN | FP | FN | Precision | Recall | Accuracy |
---|---|---|---|---|---|---|---|---|---|
0.50 | 253 | 163 | 103 | 137 | 60 | 0 | 0.632 | 1.000 | 0.800 |
0.55 | 178 | 139 | 103 | 161 | 36 | 0 | 0.741 | 1.000 | 0.880 |
0.60 | 140 | 124 | 103 | 176 | 21 | 0 | 0.831 | 1.000 | 0.930 |
0.65 | 119 | 112 | 103 | 188 | 9 | 0 | 0.920 | 1.000 | 0.970 |
0.70 | 110 | 108 | 101 | 190 | 7 | 2 | 0.935 | 0.981 | 0.970 |
0.75 | 109 | 107 | 101 | 191 | 6 | 2 | 0.944 | 0.981 | 0.973 |
0.80 | 106 | 105 | 101 | 193 | 4 | 2 | 0.962 | 0.981 | 0.980 |
0.85 | 98 | 97 | 97 | 197 | 0 | 6 | 1.000 | 0.942 | 0.980 |
0.90 | 90 | 90 | 90 | 197 | 0 | 13 | 1.000 | 0.874 | 0.957 |
0.95 | 80 | 80 | 80 | 197 | 0 | 23 | 1.000 | 0.777 | 0.923 |
0.99 | 71 | 71 | 71 | 197 | 0 | 32 | 1.000 | 0.689 | 0.893 |
Technique | Precision | Recall | F1-Score |
---|---|---|---|
Current version | 0.962 | 0.981 | 0.971 |
Best of [6] | 0.931 | 0.931 | 0.931 |
Random Forest (from [6]) | 0.931 | 0.931 | 0.931 |
N. | Instruction | Instruction Time (s) | Incremental Time (s) |
---|---|---|---|
1 | CREATE FUZZY OPERATOR | 0.000 | 0.000 |
2 | CREATE FUZZY OPERATOR | 0.000 | 0.000 |
3 | CREATE FUZZY OPERATOR | 0.000 | 0.000 |
4 | USE DB | 0.003 | 0.003 |
5 | JOIN OF COLLECTIONS | 216.661 | 216.664 |
6 | FILTER | 1.262 | 217.926 |
7 | SAVE AS | 0.529 | 218.455 |
8 | GET COLLECTION | 0.106 | 0.106 |
9 | GROUP | 0.035 | 0.141 |
10 | EXPAND | 1.910 | 2.051 |
11 | FILTER | 0.091 | 2.142 |
12 | SAVE AS | 0.207 | 2.349 |
Total Time (s) | 220.804 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fosci, P.; Psaila, G. Soft Integration of Geo-Tagged Data Sets in J-CO-QL+. ISPRS Int. J. Geo-Inf. 2022, 11, 484. https://doi.org/10.3390/ijgi11090484
Fosci P, Psaila G. Soft Integration of Geo-Tagged Data Sets in J-CO-QL+. ISPRS International Journal of Geo-Information. 2022; 11(9):484. https://doi.org/10.3390/ijgi11090484
Chicago/Turabian StyleFosci, Paolo, and Giuseppe Psaila. 2022. "Soft Integration of Geo-Tagged Data Sets in J-CO-QL+" ISPRS International Journal of Geo-Information 11, no. 9: 484. https://doi.org/10.3390/ijgi11090484
APA StyleFosci, P., & Psaila, G. (2022). Soft Integration of Geo-Tagged Data Sets in J-CO-QL+. ISPRS International Journal of Geo-Information, 11(9), 484. https://doi.org/10.3390/ijgi11090484