Skip to content

Doesn't work for large files #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
saschalalala opened this issue Nov 1, 2015 · 4 comments
Open

Doesn't work for large files #1

saschalalala opened this issue Nov 1, 2015 · 4 comments

Comments

@saschalalala
Copy link

saschalalala commented Nov 1, 2015

Hey, I tried processing a quite large file (476MB) and get an allocation size overflow.

@evitolins
Copy link

Having the same issue in FF 52.0.1 with a 314.8 MB .json file

@faddat
Copy link

faddat commented Apr 29, 2017

500MB file dying here as well.

@danielhickman
Copy link

Maybe someone submitting a PR with an implementation of something like this? I'm not sure if it is similar but I really only wanted a single conversation so I wrote a Ruby script to split my JSON file into many by ID. It messily made ~50 files and you'll need to search the original file to figure out which ID you need. You can also one by one add them to Hangouts Reader them since it doesn't destroy previous parses.

require 'json'

def getJSON(file)
	if File.readable?(file)
		$data = JSON.parse(IO.read(file))["conversation_state"]
	end
end


getJSON("Hangouts.json")
puts "Parsed File"

$restructured = {}
$data.each do |i|
	id = i["conversation_id"]["id"]
	puts "Found conversation in #{id}"
	if !$restructured[id]
		$restructured[id] = {"conversation_state" => []}
	end
	$restructured[id]["conversation_state"] << i
end
puts "Finished sorting"

$restructured.each do |key, value|
	puts "Generating #{key}"
	output = File.new("#{key}.json", "w+")
	output.write(JSON.generate(value))
end
puts "Done"

I imported the 10mb file just fine, but results may vary if you have a really long chat you'd like to import.

@crutchcorn
Copy link

crutchcorn commented Nov 26, 2018

I couldn't quite figure out the Ruby program, so I write my own with Node and a ton of un-needed dependancies that I used for laziness:

const jsonn = require("jsonstream"),
  fs = require("fs"),
  util = require("util"),
  fs_writeFile = util.promisify(fs.writeFile),
  rxjs = require("rxjs"),
  { debounceTime } = require("rxjs/operators");

const dataa = {};

const sub = new rxjs.Subject();

sub.pipe(debounceTime(1000)).subscribe(data => {
  console.log(`${data.new ? "new" : "old"}Id, ${data.id}`);
});

fs.createReadStream("./Hangouts.json")
  .pipe(jsonn.parse("conversations.*"))
  .on("data", data => {
    const id =
      data["conversation"] &&
      data["conversation"]["conversation_id"] &&
      data["conversation"]["conversation_id"]["id"];

    if (id) {
      if (dataa[id]) {
        sub.next({ new: false, id });
        dataa[id].conversations.push(data);
      } else {
        sub.next({ new: true, id });
        dataa[id] = {
          conversations: [data]
        };
      }
    }
  })
  .on("end", () => {
    Object.keys(dataa).reduce(async (prev, key) => {
        await prev;
        console.log('writing', key);
      await fs_writeFile(`${key}.json`, JSON.stringify(dataa[key]));
      delete dataa[key];
    }, Promise.resolve());
  });

Tested with 700MB files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants