Jump to content

Google.com


KISS_Racing

Recommended Posts

Kiss,

 

When you use a search engine, such as Google.Com, how do you suppose the information that you see presented to you all nice and neat got there? Google.com visits your website too - along with every other website it can locate.

 

The difference is that now you can see all the bots as well as all the other "guests" that visit TXSZ, whereas with the old software the bots were not showing up on the list of people online.

 

Nick

Link to comment
Share on other sites

nick,

 

thanks for educating me on what goes on behind the scenes and what we aren't used to seeing. i can now add this to my resume and next time someone asks, i can act like i know what i'm talking about...

 

folks mystery solved thanks to nick...

 

oh nick, i always thought that a website owner/company had to pay somone to do submissions for them to search engines like google, lycos, dog pile and etc.... see i did learn something! thanks again.....

Link to comment
Share on other sites

i'm sorry , i'm coming into the conversation late (and i'm illterate).............what the hell is a bot and should i be concerned?

Link to comment
Share on other sites

Just a little tid bit of info, I just did a test, and now when searching on google for texas auto racing this board is the very first link like it shows, and when just searching for Texas Racing its the first one on the 3rd page mainly due to all the horse and dog racing sites.... I know before it was alot further back in the search engine... :)

Link to comment
Share on other sites

  • 3 months later...
this is one of the only forums I don't get port scans from.

That's because I hand screen every registration. I get between 10 and 25 registrations a day that I deny. Takes about 5 minutes to screen the IP address, look up the host, search various PHP sites for a match to the registration and run the screen name and email addy through the various search engines.

 

I also require email responses from most new registrations.

 

You will also notice that (knock on wood) that we have no problem with spammers for the same reason.

 

Nick

Link to comment
Share on other sites

  • 9 months later...
I didn't know that either. Thanks Again Nick!!

:D:D:D

Unfortunately, the number of spammers trying to register at TXSZ has increased dramatically over the past couple of months. Apparently they're not happy with just messing up your email. They're moving on to where they can get multiple reads instead of just one at a time.

 

Here's one of the tools I use to check IP addresses.

 

http://www.fspamlist.com/files/export.txt

 

Nick

Link to comment
Share on other sites

  • 5 months later...

To who has no computer skills at how google gets it's information its whats called a spyder. Excuse the coding at the bottom .. You wont understand it if you dont know how to code.

 

Spiders and robots are programs that browse the web automatically, usually for gathering and indexing links or other information.

 

XML and its grandparent SGML are attempts to instill meaningful order into information. With them, single documents become leaves of databases. A collection of pages can be displayed as HTML easily through conversion or used for indexed searching or even generating entirely new documents.

 

The Internet has always been full of data, just never with any real meta-organization. You can think of the Internet itself as the single most important database in existence, but without it all being in a formatted language like XML or some other rigid scheme, it’s not a valuable database. Information without order, indices and strong categorization, reduces quickly to noise.

 

The real value of the Internet is found in its surfeit of plain text, no offense to the porn industry. The one arena where no one debates the supremacy of Perl is text parsing and manipulating. Therefore, it’s no real stretch to set some Perl loose on the Internet, with the right instructions, and find the value in that great unkeyed DB.

 

So let’s do something really valuable with the WWW! Let’s find a celebrity’s birthday. We’ll pick Jimmy Page to dull the irony somewhat. We are using simple regexes to check for birthdays. Much better ones could be crafted for serious applications.

Code

 

#!/usr/bin/perl

use strict;

use warnings;

#---------------------------------------------------------------------

use WWW::Spyder; # our crawler

use URI::Escape; # to properly escape our query for the search engine

#---------------------------------------------------------------------

@ARGV == 2 or usage();

my $spyder = WWW::Spyder->new(sleep_base => 20,

exit_on => { pages => 30,

time => '1min'});

my $name = join(' ',@ARGV);

$spyder->terms($name, qr/birthdays?/i);

 

$spyder->seed( 'http://www.google.com/search?q=' .

uri_escape(qq{"$name"}) );

 

my $bday;

while ( my $page = $spyder->crawl ) {

 

print "Check-->> ", $page->url, "\n";

 

# try to extract the birthday here

( $bday ) = $page->text =~

m,$name\s+was born on ([^.]+\d\d+),sio;

last if $bday;

( $bday ) = $page->text =~

m,$name\'s\s+birthday is ([^.]+\d\d+),sio;

last if $bday;

}

 

if ( $bday ) {

print "\n ${name}'s birthday seems to be: $bday\n\n";

} else {

print "\n Sorry, couldn't find ${name}'s birthday quickly.\n\n";

}

 

exit 0;

#=====================================================================

sub usage {

my ( $tool ) = $0 =~ m,([^\/]+)$,;

die <<KettleChips;

----------------------------------------------------------------------

USAGE:

$tool [Proper Name]

 

I will try to find the birthday of someone famous if you will please

give me his/her name. I can only do two word names right now.

----------------------------------------------------------------------

KettleChips

}

#=====================================================================

 

Usage

 

jinx[96]>spyder-birthday Jimmy Page

 

Output

 

Check-->> http://www.google.com/search?q=%22Jimmy%20Page%22

Check-->> http://www.led-zeppelin.com/

Check-->>

http://directory.google.com/Top/Arts/Music/Bands_and...

Check-->> http://home.earthlink.net/~juliannwh/

 

Jimmy Page's birthday seems to be: January 9, 1944

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • advertisement_alt
  • advertisement_alt
  • advertisement_alt


×
×
  • Create New...