proNlp1/clust.py

from infBack import get_vect as gv
from sklearn.feature_extraction.text import TfidfVectorizer
from stopWords import stopWrdList
import numpy as np


def clustering():

    # This are the relevant news cue words
    voc = ["ine", "pri", "pan", "prd", "pt", "pvem", "verde", "movimiento", "ciudadano", "panal", "alianza", "morena", "partido", "encuentro", "social", "electoral"]

    stop_words = stopWrdList()

    dataVect = gv()

    dataVect = np.array(dataVect)

    corpus = dataVect[:, 2]

    vectorizer = TfidfVectorizer(strip_accents='ascii', analyzer='word', stop_words=stop_words, vocabulary=voc)

    X = vectorizer.fit_transform(corpus)

    del dataVect, stop_words, vectorizer  # , corpus

    J = X.toarray()

    # The indexes are extracted to obtain only the relevant news from the general corpus

    index = []

    for x in range(0, len(J)):
        if sum(J[x]) != 0:
            index.append(x)

    index = tuple(index)

    electCorp = [corpus[x] for x in index]

    del corpus

    # This section of the code processes the political party news in order to give a emotional classification

    temp = []

    for i in electCorp:
        temp.append(i.split(' '))

    return temp
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00			`from infBack import get_vect as gv`
small change to the model 2017-11-07 15:04:18 +00:00			`from sklearn.feature_extraction.text import TfidfVectorizer`
v0.1 2017-11-28 01:14:38 +00:00			`from stopWords import stopWrdList`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00			`import numpy as np`
small change to the model 2017-11-07 15:04:18 +00:00
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`def clustering():`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`# This are the relevant news cue words`
			`voc = ["ine", "pri", "pan", "prd", "pt", "pvem", "verde", "movimiento", "ciudadano", "panal", "alianza", "morena", "partido", "encuentro", "social", "electoral"]`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`stop_words = stopWrdList()`
small change to the model 2017-11-07 15:04:18 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`dataVect = gv()`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`dataVect = np.array(dataVect)`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`corpus = dataVect[:, 2]`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`vectorizer = TfidfVectorizer(strip_accents='ascii', analyzer='word', stop_words=stop_words, vocabulary=voc)`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`X = vectorizer.fit_transform(corpus)`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`del dataVect, stop_words, vectorizer # , corpus`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`J = X.toarray()`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`# The indexes are extracted to obtain only the relevant news from the general corpus`
Two new files one to create a stop word dictionary the other to test clustering 2017-10-30 22:32:07 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`index = []`
Added a filter to obtain only pertinent news 2017-11-10 17:17:27 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`for x in range(0, len(J)):`
			`if sum(J[x]) != 0:`
			`index.append(x)`
Added a filter to obtain only pertinent news 2017-11-10 17:17:27 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`index = tuple(index)`
Added a filter to obtain only pertinent news 2017-11-10 17:17:27 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`electCorp = [corpus[x] for x in index]`
Added a filter to obtain only pertinent news 2017-11-10 17:17:27 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`del corpus`
Added a filter to obtain only pertinent news 2017-11-10 17:17:27 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`# This section of the code processes the political party news in order to give a emotional classification`
Added a filter to obtain only pertinent news 2017-11-10 17:17:27 +00:00
v0.1 2017-11-28 01:14:38 +00:00			`temp = []`

			`for i in electCorp:`
			`temp.append(i.split(' '))`

			`return temp`