c++ - Finding duplicates in JSON file after parsing with Boost -


how can find duplicates in json file after parsing out code below? want count number of duplicates in data duplicate have first name, last name, , email address match.

the json file rather huge, won't copy , paste here. here snippet of it:

[     {       "firstname":"cletus",     "lastname":"defosses",     "emailaddress":"ea4ad81f-4111-4d8d-8738-ecf857bba992.defosses@somedomain.org"   },   {       "firstname":"sherron",     "lastname":"siverd",     "emailaddress":"51c985c5-381d-4d0e-b5ee-83005f39ce17.siverd@somedomain.org"   },   {       "firstname":"garry",     "lastname":"eirls",     "emailaddress":"cc43c2da-d12c-467f-9318-beb3379f6509.eirls@somedomain.org"   }] 

this main.cpp file:

#include <iostream> #include <string>  #include "customer.h" #include "boost\property_tree\ptree.hpp" #include "boost\property_tree\json_parser.hpp" #include "boost\foreach.hpp"  using namespace std;  int main() {     int numofcustomers;      // parse json file     boost::property_tree::ptree file;     boost::property_tree::read_json("customers.json", file);      cout << "reading file..." << endl;      numofcustomers = file.size();      // iterate on each top level entry     boost_foreach(boost::property_tree::ptree::value_type const& rowpair, file.get_child(""))     {         // rowpair.first == ""  , rowpair.second subtree names , emails          // iterate on rows , columns         boost_foreach(boost::property_tree::ptree::value_type const& itempair, rowpair.second)         {             // e.g. itempair.first == "firstname: " or "lastname: "             cout << itempair.first << ": ";             // e.g. itempair.second actual names , emails             cout << itempair.second.get_value<std::string>() << endl;         }         cout << endl;     }     cout << endl;      return 0; } 

the customer class generic class.

class customer { private:     std::string m_firstnme;     std::string m_lastname;     std::string m_emailaddress;  public:     std::string getfirstname();     void setfirstname(std::string firstname);      std::string getlastname();     void setlastname(std::string lastname);      std::string getemailaddress();     void setemailaddress(std::string emailaddress); }; 

you'd typically insert customer objects/keys std::set or std::map , define total ordering spots duplicates on insertion.

defining key function , comparator object:

boost::tuple<string const&, string const&, string const&> key_of(customer const& c) {     return boost::tie(c.getfirstname(), c.getlastname(), c.getemailaddress()); }  struct by_key {     bool operator()(customer const& a, customer const& b) const {         return key_of(a) < key_of(b);     } }; 

now can insert objects in set<customer, by_key>:

set<customer, by_key> unique;  // iterate on each top level array boost_foreach(boost::property_tree::ptree::value_type const& rowpair, file.get_child("")) {     customer current;     current.setfirstname    ( rowpair.second.get ( "firstname", "?"    )  ) ;     current.setlastname     ( rowpair.second.get ( "lastname", "?"     )  ) ;     current.setemailaddress ( rowpair.second.get ( "emailaddress", "?" )  ) ;      if (unique.insert(current).second)         cout << current << "\n";     else         cout << "(duplicate skipped)\n"; } 

full demo

i've duplicated 1 entry in sample json, , can see live

live on coliru

#include <iostream> #include <string> #include <set>  #include "customer.h" #include <boost/property_tree/ptree.hpp> #include <boost/property_tree/json_parser.hpp> #include <boost/foreach.hpp> #include <boost/tuple/tuple_comparison.hpp>  using namespace std;  namespace {      boost::tuple<string const&, string const&, string const&> key_of(customer const& c) {         return boost::tie(c.getfirstname(), c.getlastname(), c.getemailaddress());     }      struct by_key {         bool operator()(customer const& a, customer const& b) const {             return key_of(a) < key_of(b);         }     };      inline ostream& operator<<(ostream& os, customer const& c) {         return os << "{ '"              << c.getfirstname()    << "', '"              << c.getlastname()     << "', '"              << c.getemailaddress() << " }";     } }  int main() {     // parse json file     boost::property_tree::ptree file;     boost::property_tree::read_json("customers.json", file);      cout << "reading file..." << endl;      set<customer, by_key> unique;      // iterate on each top level array     boost_foreach(boost::property_tree::ptree::value_type const& rowpair, file.get_child(""))     {         customer current;         current.setfirstname    ( rowpair.second.get ( "firstname", "?"    )  ) ;         current.setlastname     ( rowpair.second.get ( "lastname", "?"     )  ) ;         current.setemailaddress ( rowpair.second.get ( "emailaddress", "?" )  ) ;          if (unique.insert(current).second)             cout << current << "\n";         else             cout << "(duplicate skipped)\n";     }      cout << "\n" << (file.size() - unique.size()) << " duplicates found\n"; } 

prints:

reading file... { 'sherron', 'siverd', '51c985c5-381d-4d0e-b5ee-83005f39ce17.siverd@somedomain.org } { 'cletus', 'defosses', 'ea4ad81f-4111-4d8d-8738-ecf857bba992.defosses@somedomain.org } (duplicate skipped) { 'garry', 'eirls', 'cc43c2da-d12c-467f-9318-beb3379f6509.eirls@somedomain.org }  1 duplicates found 

note i've adjusted getters less wasteful returning const&:

std::string const& getfirstname() const        { return m_firstname;            }  std::string const& getlastname() const         { return m_lastname;             }  std::string const& getemailaddress() const     { return m_emailaddress;         } 

bonus

here's equivalent program in 26 lines of c++14 code:

live on coliru


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -