Teradata Ideas

Teradata Vantage provides the fastest path to secure, scalable, high-performance analytics to tackle your most complex business challenges.

1 VOTE

TF-IDF as SQL Engine function

ML Engine TF-IDF function can be express as the following SQL query - can we create a SQL Engine function/procedure for TF-IDF?

 

SELECT
  TF.docid,
  TF.term,
  TF.TF,
  IDF.IDF,
  TF.TF*IDF.IDF AS TF_IDF
FROM
  (
    SELECT
      term,
      (SUM(TotalFrequency) OVER ())/TotalFrequency AS IDF
    FROM (
      SELECT
        term,
        CAST(SUM(frequency) AS FLOAT) AS TotalFrequency
      FROM
        demo.amazon_tokens
      GROUP BY 1
    ) AS Tbl
  ) AS IDF
  INNER JOIN (
    SELECT
      docid,
      term,
      CAST(frequency AS FLOAT)/(SUM(frequency) OVER (PARTITION BY docid)) AS TF
    FROM
      demo.amazon_tokens
  ) AS TF ON IDF.term=TF.term

  • Guest
  • Jun 8 2018
  • Attach files