TIL – Sentiment Lexicons

Lexicon” will refer to the component of a NLP system that contains information (semantic, grammatical) about individual words or word strings.

The role of lexicons in natural language processing – ACM Digital Library

Sentiment lexicons are mappings from words to scores capturing the degree of the sentiment expressed by a given word.

On the Automatic Learning of Sentiment Lexicons
  • There are different kinds of lexicons for different things (Subjectivity Lexicon, Sentiment Lexicon, Emotion, Opinion), even different languages.
  • They can be manually constructed or automated.
  • Sentiment lexicons are an important source of features.

Lexicon Embeddings

Lexicon embeddings are derived by taking scores from multiple sources of lexicon datasets. Each lexicon dataset consists of key-value pairs, where the key is a word and the value is a list of sentiment scores for that word (e.g., probabilities of the word in positive, neutral, and negative contexts). A lexicon embedding is constructed by concatenating all the scores among the datasets with respect to a word. If a word does not appear in certain datasets, 0 values are an assigned in place. The resulting embedding is in the form of a vector v ∈ R^e, where e is the total number of scores across all lexicon datasets.

(Lexicon Integrated CNN Models with Attention for Sentiment Analysis)

GSoC Fedora Happines Packets Update – Week 8th and 9th

The talk got accepted a month ago (https://pagure.io/flock/issue/55) but I didn’t know if I’d be able to attend it. I couldn’t find an appointment at the Houston consulate for July, the entire month was booked but luckily someone cancelled last week so I made a quick trip to Houston. And I got my passport back today and saw the VISA was approved. So I’ll be attending FLOCK 2018 😀

Added a search bar for the messages archive (https://pagure.io/fedora-commops/fedora-happiness-packets/issue/11). Djangos’ django.contrib.postgres.search module makes use of PostgreSQL’s full text search engine which made it pretty convenient to design a query to search against the messages, senders, and recipients. Using the SearchVector you can search against multiple fields. Maybe next week I can add an option to filter and sort the search results based on teh date or user.

Little bit of holdup on (https://pagure.io/fedora-infrastructure/issue/7053) I thought it would be straightforward setting up the production and staging instance once the dev instance was done but ran into a lot of bumps:

  1. Client IDs and secrets: In certain cases, you can’t always set the environment variables with the client ID and secret which is why I opted to go with a secret.json file for the staging and production instance.
  2. JWK endpoints: Mozillas’ OIDC Django plugin throws an error if the JWK doesn’t specify ‘alg’ (the algorithm for the key). I filed a bug report here (https://github.com/mozilla/mozilla-django-oidc/issues/247). The way I worked around it in the dev instace was using the public key from here (https://github.com/transtats/transtats/blob/devel/transtats/settings/auth.py) so I never had to use the endpoint. But I’m not entirely sure how to generate a public key for the production and staging instance from the JWK endpoint. So for now I’m thinking about making a PR to mozilla-django-oidc
  3. fedmsg: Finally, with the help of the infra team, got fedmsg to work in the staging instance. Had to publish messages to the relay.

Badge: So the badge is waiting on the logo for now (https://pagure.io/fedora-badges/issue/627). Might still be able to get this done before since fedmsg is up and running.

Design: The logo request (https://pagure.io/design/issue/606), blue doesn’t seem like the appropriate color to express happiness.

Nginx and Gunicorn

What is gunicorn?

Guincorn or Green Unicorn is a HTTP Server which supports WSGI (WSGI are a bunch rules that allow a web framework, in this case Django, to talk with the server). Django by default generates a WSGI config for your project wsgi.py, and from the comments “It should expose a module-level variable named “application“. ” Keep this in mind for later. Gunicorn is an app server. Its primary job is to work with the web application and serve up “processed” data to web servers and sometimes other app servers. Based on the pre-forker model, there is a master process which manages the workers who handle all the requests and responses.

What is Nginx?

Nginx is a web server. Its primary job is to serve the static content from a web application (HTML, images, videos etc.) to us (browsers or mobile applications). As a reverse proxy server, it’s the public face for the application, it can direct requests from the browsers to the appropriate backend server (Gunicorn). Nginx handles load balancing, web acceleration and security.

Why do we need guincorn OR NGINX?

The Django server by itself is great for development but wouldn’t survive in production. It’s not meant to handle a lot of traffic. That’s where Guincorn and Nginx come in. While we could Guincorn to run the web application, create workers and sockets, server static content, Ngnix just does a better job of handling the stress of the web as a reverse proxy. Gunicorn recommends having a proxy server in front of it (and Nginx is strongly preferred).

If your application is a fancy upscale restaurant, Ngnix would be the dedicated expeditor and Gunicorn the Chef who directs the rest of the kitchen in this case your Django applications.

Setting up Gunicorn

pip install gunicorn

Gunicorn requires an application object, remember the wsgi.py generated by Django and a socket file to communicate with Nginx (the socket file is automatically created the first time the application is run with Gunicorn). You can also specify the number and type of workers you want.

gunicorn --workers 3 --bind unix:myproject.sock wsgi:app

Setting up Nginx

The default welcome site you see is at /usr/share/nginx/html

It’s specified in the default server block configuration file /etc/nginx/nginx.conf

A server block to used to receive requests from a particular domain and port. A location block is defined in a server block and can be used to match the request URI. The bare minimum to set up Gunicorn with Nginx is this:

  server {
    listen 80;
    server_name example.org;
    access_log  /var/log/nginx/example.log;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

Here a server block has been defined for example.org which listens to requests from port 80 and passes it on to the proxied server But in our case the proxied server is the socket at myproject.sock (http://unix:/home/user/myproject/myproject.sock;)






Share and Follow buttons on FB, Twitter and Google +

Share Buttons

Adding a hashtag button for Twitter is pretty straightforward with the Hashtag button but for Facebook and Google Plus you need to get creative and use a prefill text to toss in the hashtag (In my case #FedoraAppreciationWeek2k18). Plus you need to register your application to get a client ID for both the FB and Google share buttons.


For Facebook, I was checking out the share buttons but there was no way you could prefill the text. And they have a platform policy for sharing.

Don’t prefill any content in captions, comments, messages or the user message parameter of posts unless (a) it is a single hashtag in a post shared through our Share Dialog (but not via our APIs), (b) it was created by the person using your app, or (c) it was created by a business whose employees use your app to administer the business’s presence on Facebook.

So I went with Share Dialogs which has a hashtag parameter. But before that, you need to create a new App on Facebook and then set up the Facebook SDK for Javascript. In your settings under App Domains, don’t forget to add http://localhost:8000 (It doesn’t accept If you don’t you might get this error “ Can’t Load URL: The domain of this URL isn’t included in the app’s domains”

To make it easier for the FB crawler to fetch the relevant content from the site (instead of trying to guess what the header and body for the post should be) we use Open graph tags.

You can even check what it might look like using the Sharing debugger.

Google +

The Share tag is the easiest way to share a link on Google Plus but again no way to prefill text. But I found an alternative, I decided to go with Interactive Posts since it allowed for prefilled text (data-prefilltext parameter). You need to register your app to get the client ID. Don’t forget under Authorised JavaScript origins (in your projects credentials) you need to add http://localhost:8000 (without the / at the end)

Follow Buttons

I decided to go with a simple link to the twitter handle. But there is an official follow button available. You just can’t customize it much.


There is a deprecetaed follow button. But the page plugin is the way to go. Again not a lot of options for customization.

google +

We do have a follow button for Google +. And this one does not require a client ID.


fedmsg on CentOS


fedmsg is “a library built on ZeroMQ using the PyZMQ Python bindings”. So I thought it might help to learn a little bit more about ZeroMQ ( which is not a messaging queue).

  • Contexts – You usually have one context per process and which manages the sockets.
  • Socket – It can be configured and connect to other sockets to exchange messages. You have different types for the different patterns. fedmsg uses the publish/subscribe pattern (asynchronous), where a single process sends messages to multiple recipients.
  • Channel – It’s the connection between two sockets.
  • Transport – fedmsg, in particular, uses the TCP protocol

The sockets used in fedmsg to publish and subscribe to messages is  PUB (publisher) and SUB (subscriber). Messages, which are JSON encoded, are published with a topic which can be used by subscribers to filter the messages. A topic in fedmsg can take the form


modname is the module trying to publish the message

Setting it up

Disclaimer: This is for a development instance and not production.

Installing fedmsg is pretty straightforward but configuring it can be a bit tedious.

sudo yum install fedmsg

If you go to /etc/fedmsg.d/ you should see all these files: base.py, endpoints.py, gateway.py, ircbot.py, logging.py, relay.py, and ssl.py

If you try to publish a message from the interpreter

>> import fedmsg
>>> fedmsg.publish(topic='testing', modname='test', msg={
... 'test': "Hello World",
... })

You might get this error AttributeError: 'FedMsgContext' object has no attribute 'publisher'

(Refer https://github.com/fedora-infra/fedmsg/issues/426)

To fix this you need to configure the endpoints in endpoints.py

So an endpoint is basically a string which specifies a transport (TCP) and an address to bind to.


You can read more about it here http://www.fedmsg.com/en/stable/configuration/#endpoints

In endpoints.py add your endpoint to the config dictionary. The key has to be the <name of the application> .<host name> and the value is a list of all possible endpoints the socket can bind to. You can find out your hostname using hostname -s

You can view the configuration from the terminal using fedmsg-config

One possible entry could look like this:

"__main__.centos-s-1vcpu": ["tcp://"],

If we try to publish the message now it should work. But how can I check if it shows up on the fedmsg bus? Launch another Python interpreter and try:

>> for name, endpoint, topic, msg in fedmsg.tail_messages():
... print topic, msg

This should print out all the messages currently arriving. You could also use fedmsg-tail –really-pretty from the terminal. fedmsg.tail_messages() gets messages published on the sockets listed in endpoints.

But if you try to publish now you might get this

IOError: Couldn't find an available endpoint for name '__main__.centos-s-1vcpu-1gb-lon1-01'

That’s because we specified one endpoint for this application in the config.py file. So if we change the config file to add another one.

"__main__.centos-s-1vcpu-1gb-lon1-01": ["tcp://", "tcp://"],

You should be able to publish messages now and see it

No handlers could be found for logger "fedmsg.crypto"
/usr/lib/python2.7/site-packages/fedmsg/core.py:441: UserWarning: !! invalid message received: {u'username': u'root', u'i': 3, u'timestamp': 1530124619, u'msg_id': u'2018-e4090d11-8944-4b30-9d25-0f59c549fab0', u'topic': u'org.fedoraproject.dev.test.testing', u'msg': {u'test': u'Hello World'}}
warnings.warn("!! invalid message received: %r" % e.msg)

You get this warning because validate_signatures in ssl.py is enabled by default. You can disable it for now for development.


© 2019 Anna Philips

Theme by Anders NorénUp ↑