How to speed up for large dataset #214

linWujl · 2023-04-20T01:48:29Z

Hello, my corpus is 700G, is there any way to speed up？

AngledLuffa · 2023-04-20T02:07:00Z

More threads? Better hardware?

On Wed, Apr 19, 2023, 6:48 PM Linlp ***@***.***> wrote: Hello, my corpus is 700G, is there any way to speed up？ — Reply to this email directly, view it on GitHub <#214>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWMCP43W4LZYUFGZGDLXCCIXZANCNFSM6AAAAAAXE2JWEU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

linWujl · 2023-04-20T02:56:25Z

The coocur step has cost about 7500mins and it stills at the merge step.

Is it possible that use spark to construct the cooccurrence statistics and train it with tensorflow?

AngledLuffa · 2023-04-20T03:16:16Z

We did try converting it to torch at one point, but it wound up being significantly slower than the C version. We may try again sometime. You are welcome to try...

Do you have enough memory? Might be worth checking top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speed up for large dataset #214

How to speed up for large dataset #214

linWujl commented Apr 20, 2023

AngledLuffa commented Apr 20, 2023 via email

linWujl commented Apr 20, 2023

AngledLuffa commented Apr 20, 2023

How to speed up for large dataset #214

How to speed up for large dataset #214

Comments

linWujl commented Apr 20, 2023

AngledLuffa commented Apr 20, 2023 via email

linWujl commented Apr 20, 2023

AngledLuffa commented Apr 20, 2023