Skip to content

Commit f9a7939

Browse files
book discussions
1 parent 27b8659 commit f9a7939

23 files changed

+8743
-2
lines changed

_books/20220919-kaggle-book.md

+300
Large diffs are not rendered by default.

_books/20220926-graph-algorithms-for-data-science.md

+222
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,228 @@ links:
1010
text: Book's page
1111
start: 2022-09-26 00:00:00
1212
title: Graph Algorithms for Data Science
13+
archive:
14+
- name: Luis
15+
text: I am really looking forward to read this book. Looks super interesting. Does
16+
it come with small projects?
17+
replies:
18+
- name: Tomaz Bratanic
19+
text: Every chapter apart from the first 2 is a tutorial based project of its
20+
own
21+
- name: Evren Unal
22+
text: 'Hi Tomaz Bratanic,
23+
24+
I heard gnn here and there becoming more popular day to day.
25+
26+
Does it eventually take place of other deep learning algorithms like cnn and rnn?
27+
or gnn is to use togeather with other algorithms?'
28+
replies:
29+
- name: Tomaz Bratanic
30+
text: GNN is simply a CNN fitted to any type of graphs... if CNN can be used on
31+
an image, which is a grid graph with predefined structure, that GNN is a variation
32+
of CNN that can be used on any graph structure
33+
- name: Tomaz Bratanic
34+
text: other algorithms can be used to define features of nodes to be used in GNNs
35+
- name: Evren Unal
36+
text: 'As i understand gnn does not replace other algorithms.
37+
38+
I thing to better understand gnn one should see some examles.'
39+
- name: Ali Shakiba
40+
text: 'Hi Tomaz Bratanic
41+
42+
Thanks for introducing the book.
43+
44+
Many graphs, such as the ones like Web connections or Twitter follower/following
45+
graphs are very big, although they are sparse. Are Big graphs also covered in
46+
your book, too? As far as I know, many problems in graph theory are NP-hard and
47+
many of the polynomial-time algorithms are not useful for big graphs, even assuming
48+
that something like Floyd-Warshall algorithm running on the graph corresponding
49+
to the map of the World with all cities with O(n^3) running time is frightening.'
50+
replies:
51+
- name: Tomaz Bratanic
52+
text: I don't deal with very large graphs in my book. For graphs even O(n^2) doesn't
53+
scale well. Some algorithms have approximate variations. There are also other
54+
algorithms that scale pretty well like PageRank, or Label Propagation etc...
55+
In my book, I don't deal with pathfinding at all
56+
- name: Ashish Lalchandani
57+
text: Hi Tomaz Bratanic, thanks for being here! My question is, what are the applications
58+
of graph algorithms in ML? I mean, what kind of problems in ML can be solved using
59+
graph algorithms? Also, are the graph algorithms used in ML are same as the conventional
60+
graph algorithms we use in competitive programming/leetcode?
61+
replies:
62+
- name: Tomaz Bratanic
63+
text: More classical graph algorithms have been used to find the most important
64+
nodes or find groups of nodes... Lately, there has been a shift into extracting
65+
features from graphs and using them as inputs to ML models. I have no idea what
66+
kind of graph algorithms are using in competitive landscape. Probably PageRank
67+
is the most used algorithm out there
68+
- name: Ashish Lalchandani
69+
text: What kind of features are you referring to? Also, for graph algorithms in
70+
competitive programming, i was referring to BFD, DFS, backtracking, minimum
71+
spanning tree, etc.
72+
- name: Tomaz Bratanic
73+
text: BFS and DFS are basis of some algorithms... I am talking more about unsupervised
74+
graph algorithms like pagerank, label propagation, node2vec etc
75+
- name: Tomaz Bratanic
76+
text: There are a couple of features that you can extract based on the position
77+
of the node in the network... how important it is, how well connected, how does
78+
it group by, who are their neighbors etc...
79+
- name: Ashish Lalchandani
80+
text: Oh i see, that makes sense now, thanks for explaining! Much appreciated!
81+
- name: Bengsoon
82+
text: "Hi Tomaz Bratanic thanks for writing this book. I am very new to this sub-space\
83+
\ of AI ML. \nFrom my very shallow knowledge, graph theory as well as graph database\
84+
\ have been around for a while, but I noticed that graph based ML etc has only\
85+
\ risen to fame in the last few years. Is my observation right? If so, why is\
86+
\ that?\nAlso, what are the practical strengths of graph based ML as well as the\
87+
\ limitations, especially in production/deployment settings (compared to the conventional/mainstream\
88+
\ ML algorithms like NN etc) ?"
89+
replies:
90+
- name: Tomaz Bratanic
91+
text: 'Graph ML has risen to fame only in the last years, because most of the
92+
graph ML algorithms have been developed in only the last couple of years, most
93+
noticeably embedding and GNN models.
94+
95+
If relationships are predictable, then graph models can take those relationships
96+
and use them as features in predictions whereas it is hard to encode those relationships
97+
in traditional models'
98+
- name: Tomaz Bratanic
99+
text: 'take a look at pinterest for example: [https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48](https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48)'
100+
- name: Tomaz Bratanic
101+
text: 'btw... I have tons of free articles on medium if you want to take a look
102+
at free content before deciding about the book: [https://bratanic-tomaz.medium.com/](https://bratanic-tomaz.medium.com/)'
103+
replies: []
104+
- name: Tomaz Bratanic
105+
text: Ashish Lalchandani GerryK Luis You have been selected as the winners of the
106+
free copy of the book. Please DM me and I will give you instruction to obtain
107+
a free copy of the book
108+
replies:
109+
- name: Alexey Grigorev
110+
text: Thanks for joining us this week!
111+
- name: Tomaz Bratanic
112+
text: My pleasure
113+
- name: Ashish Lalchandani
114+
text: "Thank you Tomaz Bratanic!! Thanks for answering our questions \U0001F600\
115+
\ Thanks Alexey Grigorev Francis Terence Amit for hosting book of the week,\
116+
\ much appreciated!"
117+
- name: shaolang
118+
text: 'hi, Tomaz Bratanic
119+
120+
Congrats on launching the new MEAP!
121+
122+
Are graphs algorithms:
123+
124+
1. affected by the direction (bi- or uni-) of the edges? If so, what are the gotchas
125+
we should be aware of, especially when dealing with uni-directional edges?
126+
127+
2. more effective than the "traditional" unsupervised learning algorithms for
128+
clustering, k-means, etc., other than the fact that data in graphs don''t necessarily
129+
need to conform to the same structure?
130+
131+
Recently, I''ve also come across another graph database -- Tigergraph -- that
132+
touts itself more capable because of the number of hops it can make it much greater
133+
than Neo4J, e.g., it can detect frauds from nodes/edges that are 6-8 hops away
134+
from destination. If number of hops are really that important, are there algorithms
135+
that can make up for such scenarios.'
136+
replies:
137+
- name: Tomaz Bratanic
138+
text: 1. algorithms are definitely affected whether edges are undirected or directed.
139+
You can think of a undirected edge as two directed edges, where each points
140+
in the opposite direction. The main difference between undirected and directed
141+
edges is the semantics... for example, if I am friends with you, does that directly
142+
imply that you're also a friend with me. If the direct implication can be made,
143+
then you are most likely dealing with an undirected edge. In practice you will
144+
see a lot of undirected edges.
145+
- name: Tomaz Bratanic
146+
text: 2. Clustering is a big category of graph algorithms, so it's hard to say
147+
if they are better. It has more of to do with your data input. If you are dealing
148+
with vectors, you will most likely use something like k-means, but if you are
149+
dealing with a connections between data points, then you might use something
150+
like Label Propagation.
151+
- name: Tomaz Bratanic
152+
text: 3. detect frauds from nodes/edges that are 6-8 hops away from destination...
153+
that's just marketing talk. Any database can do 6-8 hops or joins, even SQL.
154+
The question is how fast and at what scale
155+
- name: shaolang
156+
text: 'thanks for taking my questions, Tomaz Bratanic!
157+
158+
1. As direction matters in edges, does that mean results from the algorithms
159+
may differ depending on where the starting point is? Using your friend example
160+
and assuming it''s unidirectional, the algorithm would be able to detect we
161+
are friends when the query starts from you (node), but it can''t detect if the
162+
query starts from me (node)?'
163+
- name: shaolang
164+
text: '(skipping 2)
165+
166+
3. Are you saying that Neo4J can do 6-8 hops too at reasonable speed and scale?
167+
While I always take a pinch of salt, their marketing implies that Neo4J can''t
168+
even complete the query. To make Neo4J complete this many hops, would we need
169+
to write convoluted Cypher to achieve it?'
170+
- name: GerryK
171+
text: 'Hi Tomaz Bratanic, thanks for being here.
172+
173+
- Are you refering to any tools for visualisation for better understanding the
174+
graph concept?
175+
176+
- Do you see more and more projects/companies using graphs?'
177+
replies:
178+
- name: Tomaz Bratanic
179+
text: '1. I don''t talk about viz tools in the book, but my favourite tool to
180+
analyse and visualize small graphs is Gephi
181+
182+
2. I think that more and more companies are using graphs, some because they
183+
see the value, some because it is become more and more of a "hot" technology'
184+
- name: Prashant Choudhary
185+
text: 'Hi Tomaz Bratanic
186+
187+
ML models are probabilistic in nature. Using ML models to extract information
188+
from unstructured text would not be 100% accurate. Mostly 80-90% accurate. In
189+
contrast, Data in graphs should be factual and correct. Knowledge Graphs become
190+
data source for various apps like goggle, chatbot where you need the information
191+
to be factually correct. What are your thoughts?'
192+
replies:
193+
- name: Tomaz Bratanic
194+
text: It depends on your use-case. The more messier your knowledge graph, the
195+
messier the output. Extracting information from text is hard. First of all,
196+
not all of the extracted information conforms to the graph structure, and secondly,
197+
even 80-90% accuracy is sometimes hard to achieve. What might be a big problem
198+
with constructing a graph from text is entity disambigation for example
199+
- name: Taher Hassonjee
200+
text: A little late to the conversation, but this is exactly what my company does.
201+
We turn any unstructured text into a custom CSV output. If you're interested,
202+
I'm happy to give you access and get your feedback
203+
- name: Tomaz Bratanic
204+
text: do you extract triples?
205+
- name: Taher Hassonjee
206+
text: Not yet but on the roadmap
207+
- name: insop
208+
text: "Hi Tomaz Bratanic\nThank you for introducing [Graph Algorithms for Data Science](https://datatalks.club/books/20220926-graph-algorithms-for-data-science.html)\
209+
\ , I would be very interested in reading recommendation and fraud detection chapters\
210+
\ from your book.\n- one general question, what would be the difference that you\
211+
\ are covering in your book and GNN (graph neural network) and graph CNN? \n-\
212+
\ and what are the applications for each of those can be applied?\nThank you very\
213+
\ much,"
214+
replies:
215+
- name: Tomaz Bratanic
216+
text: 'GNNs are the state of the art methods of graph ML at the moment. My book
217+
builds up all the knowledge to get to GNNs, but doesn''t delve too much into
218+
them. If you are interested in recommendations and fraud detection I would recommend
219+
the following book: [https://www.manning.com/books/graph-powered-machine-learning](https://www.manning.com/books/graph-powered-machine-learning)'
220+
- name: Bhupendrasinh Thakre
221+
text: Tomaz Bratanic do you also go through hands on learning in your book or
222+
theory only
223+
- name: Tomaz Bratanic
224+
text: it's mostly hands on learning
225+
- name: Julius
226+
text: The book that was mentioned in the week 1 video about being the best book
227+
to read as regards DE. Please I didn't get the spelling
228+
replies:
229+
- name: Alexey Grigorev
230+
text: 'You need to give more context. Which week 1? Was it course? Which one?
231+
232+
Also, this is not the right channel for asking these questions. Here we invite
233+
book authors and ask them anything'
234+
13235
---
14236

15237
Graphs are the natural way to understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with practical examples and concrete advice on implementation and deployment.

0 commit comments

Comments
 (0)