c++ - Finding duplicates in JSON file after parsing with Boost -
how can find duplicates in json file after parsing out code below? want count number of duplicates in data duplicate have first name, last name, , email address match.
the json file rather huge, won't copy , paste here. here snippet of it:
[ { "firstname":"cletus", "lastname":"defosses", "emailaddress":"ea4ad81f-4111-4d8d-8738-ecf857bba992.defosses@somedomain.org" }, { "firstname":"sherron", "lastname":"siverd", "emailaddress":"51c985c5-381d-4d0e-b5ee-83005f39ce17.siverd@somedomain.org" }, { "firstname":"garry", "lastname":"eirls", "emailaddress":"cc43c2da-d12c-467f-9318-beb3379f6509.eirls@somedomain.org" }]
this main.cpp file:
#include <iostream> #include <string> #include "customer.h" #include "boost\property_tree\ptree.hpp" #include "boost\property_tree\json_parser.hpp" #include "boost\foreach.hpp" using namespace std; int main() { int numofcustomers; // parse json file boost::property_tree::ptree file; boost::property_tree::read_json("customers.json", file); cout << "reading file..." << endl; numofcustomers = file.size(); // iterate on each top level entry boost_foreach(boost::property_tree::ptree::value_type const& rowpair, file.get_child("")) { // rowpair.first == "" , rowpair.second subtree names , emails // iterate on rows , columns boost_foreach(boost::property_tree::ptree::value_type const& itempair, rowpair.second) { // e.g. itempair.first == "firstname: " or "lastname: " cout << itempair.first << ": "; // e.g. itempair.second actual names , emails cout << itempair.second.get_value<std::string>() << endl; } cout << endl; } cout << endl; return 0; }
the customer class generic class.
class customer { private: std::string m_firstnme; std::string m_lastname; std::string m_emailaddress; public: std::string getfirstname(); void setfirstname(std::string firstname); std::string getlastname(); void setlastname(std::string lastname); std::string getemailaddress(); void setemailaddress(std::string emailaddress); };
you'd typically insert customer objects/keys std::set
or std::map
, define total ordering spots duplicates on insertion.
defining key function , comparator object:
boost::tuple<string const&, string const&, string const&> key_of(customer const& c) { return boost::tie(c.getfirstname(), c.getlastname(), c.getemailaddress()); } struct by_key { bool operator()(customer const& a, customer const& b) const { return key_of(a) < key_of(b); } };
now can insert objects in set<customer, by_key>
:
set<customer, by_key> unique; // iterate on each top level array boost_foreach(boost::property_tree::ptree::value_type const& rowpair, file.get_child("")) { customer current; current.setfirstname ( rowpair.second.get ( "firstname", "?" ) ) ; current.setlastname ( rowpair.second.get ( "lastname", "?" ) ) ; current.setemailaddress ( rowpair.second.get ( "emailaddress", "?" ) ) ; if (unique.insert(current).second) cout << current << "\n"; else cout << "(duplicate skipped)\n"; }
full demo
i've duplicated 1 entry in sample json, , can see live
#include <iostream> #include <string> #include <set> #include "customer.h" #include <boost/property_tree/ptree.hpp> #include <boost/property_tree/json_parser.hpp> #include <boost/foreach.hpp> #include <boost/tuple/tuple_comparison.hpp> using namespace std; namespace { boost::tuple<string const&, string const&, string const&> key_of(customer const& c) { return boost::tie(c.getfirstname(), c.getlastname(), c.getemailaddress()); } struct by_key { bool operator()(customer const& a, customer const& b) const { return key_of(a) < key_of(b); } }; inline ostream& operator<<(ostream& os, customer const& c) { return os << "{ '" << c.getfirstname() << "', '" << c.getlastname() << "', '" << c.getemailaddress() << " }"; } } int main() { // parse json file boost::property_tree::ptree file; boost::property_tree::read_json("customers.json", file); cout << "reading file..." << endl; set<customer, by_key> unique; // iterate on each top level array boost_foreach(boost::property_tree::ptree::value_type const& rowpair, file.get_child("")) { customer current; current.setfirstname ( rowpair.second.get ( "firstname", "?" ) ) ; current.setlastname ( rowpair.second.get ( "lastname", "?" ) ) ; current.setemailaddress ( rowpair.second.get ( "emailaddress", "?" ) ) ; if (unique.insert(current).second) cout << current << "\n"; else cout << "(duplicate skipped)\n"; } cout << "\n" << (file.size() - unique.size()) << " duplicates found\n"; }
prints:
reading file... { 'sherron', 'siverd', '51c985c5-381d-4d0e-b5ee-83005f39ce17.siverd@somedomain.org } { 'cletus', 'defosses', 'ea4ad81f-4111-4d8d-8738-ecf857bba992.defosses@somedomain.org } (duplicate skipped) { 'garry', 'eirls', 'cc43c2da-d12c-467f-9318-beb3379f6509.eirls@somedomain.org } 1 duplicates found
note i've adjusted getters less wasteful returning
const&
:std::string const& getfirstname() const { return m_firstname; } std::string const& getlastname() const { return m_lastname; } std::string const& getemailaddress() const { return m_emailaddress; }
bonus
here's equivalent program in 26 lines of c++14 code:
Comments
Post a Comment