Suggest the tags based on the content that was there in the question posted on Stackoverflow.
Stack Overflow is the largest, most trusted online community for developers to learn, share their programming knowledge, and build their careers.
Stack Overflow is something which every programmer use one way or another. Each month, over 50 million developers come to Stack Overflow to learn, share their knowledge, and build their careers. It features questions and answers on a wide range of topics in computer programming. The website serves as a platform for users to ask and answer questions, and, through membership and active participation, to vote questions and answers up or down and edit questions and answers in a fashion similar to a wiki or Digg. As of April 2014 Stack Overflow has over 4,000,000 registered users, and it exceeded 10,000,000 questions in late August 2015. Based on the type of tags assigned to questions, the top eight most discussed topics on the site are: Java, JavaScript, C#, PHP, Android, jQuery, Python and HTML
Source: https://www.kaggle.com/c/facebook-recruiting-iii-keyword-extraction/
It is a multi-label classification problem Multi-label Classification: Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A question on Stackoverflow might be about any of C, Pointers, FileIO and/or memory-management at the same time or none of these.
- Predict as many tags as possible with high precision and recall.
- Incorrect tags could impact customer experience on StackOverflow.
- No strict latency constraints.
All of the data is in 2 files: Train and Test.
Train.csv contains 4 columns: Id,Title,Body,Tags. Size of Train.csv - 6.75GB Number of rows in Train.csv = 6034195 Test.csv contains the same columns but without the Tags, which you are to predict. Size of Test.csv - 2GB
The questions are randomized and contains a mix of verbose text sites as well as sites related to math and programming. The number of questions from each site may vary, and no filtering has been performed on the questions (such as closed questions).
Dataset contains 6,034,195 rows. The columns in the table are:
- Id - Unique identifier for each question
- Title - The question's title
- Body - The body of the question
- Tags - The tags associated with the question in a space-seperated format (all lowercase, should not contain tabs '\t' or ampersands '&')
Title: Implementing Boundary Value Analysis of Software Testing in a C++ program?
Body :
#include<
iostream>\n
#include<
stdlib.h>\n\n
using namespace std;\n\n
int main()\n
{\n
int n,a[n],x,c,u[n],m[n],e[n][4];\n
cout<<"Enter the number of variables";\n cin>>n;\n\n
cout<<"Enter the Lower, and Upper Limits of the variables";\n
for(int y=1; y<n+1; y++)\n
{\n
cin>>m[y];\n
cin>>u[y];\n
}\n
for(x=1; x<n+1; x++)\n
{\n
a[x] = (m[x] + u[x])/2;\n
}\n
c=(n*4)-4;\n
for(int a1=1; a1<n+1; a1++)\n
{\n\n
e[a1][0] = m[a1];\n
e[a1][1] = m[a1]+1;\n
e[a1][2] = u[a1]-1;\n
e[a1][3] = u[a1];\n
}\n
for(int i=1; i<n+1; i++)\n
{\n
for(int l=1; l<=i; l++)\n
{\n
if(l!=1)\n
{\n
cout<<a[l]<<"\\t";\n
}\n
}\n
for(int j=0; j<4; j++)\n
{\n
cout<<e[i][j];\n
for(int k=0; k<n-(i+1); k++)\n
{\n
cout<<a[k]<<"\\t";\n
}\n
cout<<"\\n";\n
}\n
} \n\n
system("PAUSE");\n
return 0; \n
}\n
<p>The answer should come in the form of a table like</p>\n\n
<pre><code>
1 50 50\n
2 50 50\n
99 50 50\n
100 50 50\n
50 1 50\n
50 2 50\n
50 99 50\n
50 100 50\n
50 50 1\n
50 50 2\n
50 50 99\n
50 50 100\n
</code></pre>\n\n
<p>if the no of inputs is 3 and their ranges are\n
1,100\n
1,100\n
1,100\n
(could be varied too)</p>\n\n
<p>The output is not coming,can anyone correct the code or tell me what\'s wrong?</p>\n'
Tags : 'c++ c'